Closed rzeller closed 9 years ago
I've figured out how to download the data, convert it from .sra to .fasta and read it into bipython using SeqIO. Instructions are now on the data retrieval page on the wiki.
I'm not sure which datasets to download. Bloom, if you figure that part out, I can write a bash script that downloads all the data and converts it into .fasta format.
Cool, can you convert it to Fastq instead? that contains information about the quality of each base read that will be useful in my step of the pipeline. I think seqIO should support it http://biopython.org/wiki/SeqIO
Yeah, no problem. I just changed the instructions to dump and load .fastq files.
Let me know which datasets we should be working with.
Here are some MS patients: http://www.ncbi.nlm.nih.gov/sra/SRX551536[accn]
http://www.ncbi.nlm.nih.gov/sra/SRX553103[accn] http://www.ncbi.nlm.nih.gov/sra/SRX553102[accn] http://www.ncbi.nlm.nih.gov/sra/SRX553101[accn]
http://www.ncbi.nlm.nih.gov/sra/SRX553100[accn] http://www.ncbi.nlm.nih.gov/sra/SRX553099[accn] http://www.ncbi.nlm.nih.gov/sra/SRX553098[accn]
http://www.ncbi.nlm.nih.gov/sra/SRX552923[accn] http://www.ncbi.nlm.nih.gov/sra/SRX552922[accn] http://www.ncbi.nlm.nih.gov/sra/SRX552921[accn]
http://www.ncbi.nlm.nih.gov/sra/SRX552916[accn] http://www.ncbi.nlm.nih.gov/sra/SRX552915[accn] http://www.ncbi.nlm.nih.gov/sra/SRX552914[accn]
http://www.ncbi.nlm.nih.gov/sra/SRX552900[accn] http://www.ncbi.nlm.nih.gov/sra/SRX552899[accn] http://www.ncbi.nlm.nih.gov/sra/SRX552898[accn]
Looking for controls to compare to now
On Thu, Dec 4, 2014 at 3:45 PM, Robby Zeller notifications@github.com wrote:
Yeah, no problem. I just changed the instructions to dump and load .fastq files.
Let me know which datasets we should be working with.
— Reply to this email directly or view it on GitHub https://github.com/rbloom5/ImmuneRep/issues/1#issuecomment-65724844.
Also, it looks like most of the data from NCBI is mirrored here http://www.ebi.ac.uk/ena and they have it it different (more useful) formats. You can just search the accession number
I've added getdata.py
to the data-retrieval branch. It will download all the .sra files for the links above and convert them into the .fastq format. For information on how to use it, see the data retrieval page on the wiki. I'm going to close this issue. Let me know if getdata.py
works for you guys.
In getdata.py, SRR1383446 was supposed to be SRR1383326. I've updated it in the data-retrieval branch. If you're running the old version of getdata.py, it will throw an error. You can always cut off the beginning of the list to start from where it broke.
Figure out how to download and convert the data into a python-readable format.