jamiewaese / ePlant

ePlant is a data visualization tool for integrating and exploring multiple levels of biological data.
MIT License
2 stars 1 forks source link

ePlant RNA-seq viewer - survey of data sets #34

Closed nprovart closed 9 years ago

nprovart commented 10 years ago

Hi Hans, can you do a quick survey of the Illumina RNA-seq data sets that are available for Arabidopsis at the Short Read Archive (SRA) at NCBI? Basically use the Browse function available at http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies or http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=samples and limit the species to Arabidopsis[organism] and RNA-seq (we're not interested in epigenomic or Chip-seq data). What I'd like to see is a table of the studies/samples, the system used to generate the data (Illumina, preferably, or 454 etc.), kind of reads (paired-ends?) and comments on the experiment (developmental, i.e. leaves, roots, other tissues etc. or abiotic or biotic stress. Focus only on wild-type samples, not mutants. After we've checked with our collaborator, Ann Loraine at UNCC, we'll proceed to process these with Bowtie or BWA at iPlant.

yuzhenmi commented 10 years ago

I checked these out but I'm pretty confused.

Do the studies and samples data overlap?

Is there a search criterion for the type of data? Simply searching "Arabidopsis[organism] AND RNA-seq" seems to filter out a lot of RNA-seq datasets.

I was able to find platform and sample information (e.g. http://www.ncbi.nlm.nih.gov/sra/ERX204409) but the kind of reads and wildtype vs mutant seem to be ambiguous.

Should this table be manually created or with a script somehow?

nprovart commented 10 years ago

Hi Hans, this will likely have to be semi-manually created. With the link example you gave above, there are 33 "experiments" (Biosamples) within the related BioProject link. Some of them seem to be technical controls, such as different dilutions of RNA to test the sensitivity of the system. I'd probably not include these samples, with the exception of the actual single cell data, in our archive. But it would be good to know that these are in the SRA. Best, Nick

......................................... Nicholas Provart, PhD Associate Professor, Plant Cyberinfrastructure & Systems Biology Chair, Bioinformatics SC, Multinational Arabidopsis Steering Committee Member, North American Arabidopsis Steering Committee and IAIC Member, Centre for the Analysis of Genome Evolution and Function

Currently on sabbatical in the Brady Lab at UC Davis

Phone. +1-530-752-2728 Skype. nicholas.provart, Fax. +1-425-675-7036 URL. http://www.csb.utoronto.ca/faculty/provart-nicholas The Bio-Analytic Resource. http://www.BAR.utoronto.ca email. nicholas.provart@utoronto.ca

On Sun, Oct 20, 2013 at 2:38 PM, Hans Yu notifications@github.com wrote:

I checked these out but I'm pretty confused.

Do the studies and samples data overlap?

Is there a search criterion for the type of data? Simply searching "Arabidopsis[organism] AND RNA-seq" seems to filter out a lot of RNA-seq datasets.

I was able to find platform and sample information (e.g. http://www.ncbi.nlm.nih.gov/sra/ERX204409) but the kind of reads and wildtype vs mutant seem to be ambiguous.

Should this table be manually created or with a script somehow?

— Reply to this email directly or view it on GitHubhttps://github.com/jamiewaese/ePlant/issues/34#issuecomment-26683424 .