johnsolk / MMETSP

re-assembly and analysis of the Marine Microbial Eukaryotic Transcriptome Sequencing Project
13 stars 11 forks source link

Repository cleanup, try 2 #9

Closed ctb closed 8 years ago

ctb commented 8 years ago

@ljcohen could you remove or clarify the purpose of these files on your 'msu' branch?

MMETSP_SRA_Run_Info_subset_msu1.csv
MMETSP_SRA_Run_Info_subset_msu2.csv
MMETSP_SRA_Run_Info_subset_msu3.csv
MMETSP_SRA_Run_Info_subset_msu4.csv
MMETSP_SRA_Run_Info_subset_msu5.csv
MMETSP_SRA_Run_Info_subset_msu6.csv
MMETSP_SRA_Run_Info_subset_msu7.csv
SraRunInfo.csv
download.txt
download_cds.txt
johnsolk commented 8 years ago

SraRunInfo.csv is required as input for all scripts. It contains the list of all sample ID and raw fastq url in the dataset. Originally downloaded from PRJNA231566.

I can remove the MMETSP_SRA_Run_Info_subset_msu*.csv files (or put them into a separate directory for subsets?). These are subsets of SraRunInfo.csv so that the whole data set doesn't have to be run at the same time.

I can move download.txt and download_cds.txt into a different directory for imicrobe files. These contain url generated by download_mmetsp_transcript.sh to download the MMETSP assemblies. It takes some time to generate these lists of url. This was the only way Luiz and I could figure out how to download a mass of files from the imicrobe ftp site: ftp://ftp.imicrobe.us/projects/104/transcriptomes/