seandavi / SRAdb

Git mirror of Bioconductor SRAdb package
21 stars 3 forks source link

getSRA() did not return all available study (SRP*/ERP*/DRP*) IDs #21

Open parnika91 opened 5 years ago

parnika91 commented 5 years ago

Hello,

I downloaded SRAmetadb.sqlite on Jan 31, 2019 by running if(!file.exists('SRAmetadb.sqlite')) sqlfile <<- getSRAdbFile(). I wanted to do a fulltext search to get all studies that have submitted RNA-seq data on malaria using rs_malaria <- getSRA( search_terms = "malaria", out_types = c('sra'), sra_con ). I did the same search using "Plasmodium" as the search term and selected the RNA-seq studies among them all. However, I am missing several studies that contain one or both of these words on SRA.

For example, from this search I get the ID "DRP000987", which is RNA-seq data from this paper: (https://genome.cshlp.org/content/early/2014/08/03/gr.158980.113). DRP000987 was published on SRA (wondering if I get the meaning of "published" right) on 2014-05-14. The corrigendum of this article (https://genome.cshlp.org/content/28/8/1253.full) mentions another dataset, DRP001953, which was published on SRA on 2018-05-27. But I do not get DRP001953 from this getSRA() search.

I also tried dbGetQuery by mentioning "malaria" or "Plasmodium" in study_title, study_abstract, experiment_title and sample_attribute. But the result is the same.

What am I misunderstanding?

I hope this will be reproducible. Thanks in advance for your help. Also, many thanks for the package!

Best wishes, Parnika.

avitalsteiman commented 5 years ago

Hi there, I also found that studies from the last few months are not available yet. When will SRAmetadb.sqlite be updated? And do we need to re-download it or is there a way to update? Thanks!