kbaseattic / assembly

An extensible framework for genome assembly.
MIT License
12 stars 14 forks source link

Support for using SRA datasets as input #27

Open levinas opened 10 years ago

levinas commented 10 years ago

Examples: -srr srr000218, srr000219 -srp srp000323

sebhtml commented 10 years ago

19

@cbun a47642e0

cbun commented 10 years ago

Is there an SRA API or standard URL route mapping SRA# -> download link?

sebhtml commented 10 years ago

Do you want to use the one at EBI, the one at NCBI, or the one at DDBJ ? DNAnexus has a mirror of the whole thing, but it only stores .sra files (like NCBI). The EBI uses .fastq.gz files instead of .sra files. Japan's DDBJ uses .fastq.bz2 files instead of .sra files.

Bottom line: .sra files are not fun to use and add no plus value.

In my opinion, the closed thing to an addressable container in genomics is the bam format.

levinas commented 10 years ago

My order is EBI > DDBJ > NCBI. Unfortunately there are some projects for which only .sra files are available.

A link you posted in #40 for mapping accession ID to URL:

http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=SRA001125&result=read_run&fields=study_accession,secondary_study_accession,sample_accession,secondary_sample_accession,experiment_accession,run_accession,scientific_name,instrument_model,library_layout,fastq_ftp,fastq_galaxy,submitted_ftp,submitted_galaxy,col_tax_id,col_scientific_name,sra_ftp,sra_galaxy

On Jul 2, 2014, at 4:31 PM, Sébastien Boisvert notifications@github.com wrote:

Do you want to use the one at EBI, the one at NCBI, or the one at DDBJ ? DNAnexus has a mirror of the whole thing, but it only stores .sra files (like NCBI). The EBI uses .fastq.gz files instead of .sra files. Japan's DDBJ uses .fastq.bz2 files instead of .sra files.

Bottom line: .sra files are not fun to use and add no plus value.

In my opinion, the closed thing to an addressable container in genomics is the bam format.

— Reply to this email directly or view it on GitHub.

sebhtml commented 10 years ago

Last week, I could not download from EBI while inside Magellan. I was being IP-blocked.

I really like the search function of EBI ENA. NCBI SRA is in my opinion very hard to use.

I compared the 3 on my blog last year.

http://dskernel.blogspot.com/2013/10/how-i-use-european-nucleotide-archive.html