markziemann / dee2

Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data
http://dee2.io
GNU General Public License v3.0
39 stars 7 forks source link

Time to include some extra species #68

Open markziemann opened 4 years ago

markziemann commented 4 years ago
6th FEB 2020 no. SRX
Arabidopsis thaliana 30890
Zea mays 19753
Oryza sativa 9737
Triticum aestivum 6924
Solanum lycopersicum 6444
Sorghum bicolor 4646
Glycine max 3889
Populus trichocarpa 3485
Vitis vinifera 3258
Panicum virgatum 2338
Hordeum vulgare 2284
Solanum tuberosum 1851
Brachypodium distachyon 1814
18th FEB 2020 no. SRX
Schizosaccharomyces pombe 4718
Plasmodium falciparum 4298
markziemann commented 3 years ago

Macaca mulatta Bos Taurus Sus scrofa Gallus gallus Ovis aries

wdlingit commented 2 months ago

Thank you for providing this DEE2 database. I have been using it for quite a while. Some short questions:

  1. Is this idea of adding species ongoing?
  2. In case that this is ongoing, any specific genome assembly/annotation versions been used?

My colleagues would be interested in rice and maize. Just tested the singularity solution and it seems that we can run the pipeline by ourselves. In case that some genome assembly/annotation versions for rice and maize have been adopted by DEE2, we would like to consider following them and maybe share the computation results.

markziemann commented 2 months ago

Hi @wdlingit, we have been unsuccessfully seeking funding to support the expansion of DEE2 in particular with the backlog of mouse and human studies and the possibility of updating to the latest reference genome build. That said, I think we can work together to get rice and maize included. I will do the necessary work to modify the pipeline to include rice and maize data and then update the web server side of things. If you could do the data processing at your institution, it would help expedite things along. I'm not sure about an exact timeline, but I might have things ready to start data processing by end of August.

wdlingit commented 2 months ago

Thank you for the reply. We collected SRR accessions with NCBI Taxonomy ID 39947 plus some minor restriction. Our current SRR list to be processed is about 7K SRRs. This is smaller than what you listed a few years ago. I think this is reasonable because Tax ID 39947 is for Oryza sativa Japonica Group, a subspecies(?) of rice. Oryza sativa Japonica Group is also available in ensembl plants ( https://plants.ensembl.org/Oryza_sativa/Info/Index ) We just started (2 hours ago) a test run of 1000 SRRs and things seem OK to me. In order to make sure things are coordinated, I listed info we applied in the volunteer_pipeline.sh script:

elif [ $ORG == "osativa" ] ; then
  GTFURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.59.gtf.gz"
  GDNAURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna_sm.toplevel.fa.gz"
  CDNAURL="ftp://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-59/fasta/oryza_sativa/cdna/Oryza_sativa.IRGSP-1.0.cdna.all.fa.gz"
  BT2_MD5="05eb69ae1d8b8b0d2cc06e890bf55dc6"
  KAL_MD5="6f618eda89e9b057c99d4d7580c5858d"
  STAR_MD5="b374bef1756a1ea105c968d68c71127e"