legumeinfo / datastore-issues

mostly for issues pertaining to the content of the legumeinfo datastore; may also relate to characteristics of its user interface or managing the mirroring process to the legfed instance
Other
1 stars 0 forks source link

alfalfa expression atlas data #128

Closed adf-ncgr closed 2 years ago

adf-ncgr commented 2 years ago

following a user inquiry about data that used to be available from Noble, I ran the dataset from https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1718-7 (SRA study accession SRP055547) against a "monoploid" version of XinJiangDaYe.gnm1.12MR (ie choosing the longest individual chromosome from each set of 4 haplotypes to bring the genome down from 32 "allele-aware" chromosomes to 8 chromosomes in hopes of reducing multimapping ambiguity) I think it's probably OK to just call it XinJiangDaYe.gnm1 for purposes of datastorification and add some clarification in the README; we will likely revisit things at some point, but seems like a good opportunity to get some user feedback. another one for you, @sammyjava with data under /erdos/adf/nf-core/rnaseq/alfalfa_gene_index.monoploid/star_salmon

metadata for these samples can be found under: /erdos/adf/nf-core/rnaseq/alfalfa_gene_index/samplesheet though I haven't spent much time vetting it other than to find one attribute that helped interpretation of the PCA (choose "sample_alias" in the representation here: http://dev.lis.ncgr.org:50011/nf-core/rnaseq/alfalfa_gene_index.monoploid/multiqc/star_salmon/multiqc_report.html)

sammyjava commented 2 years ago

Three runs per sample -- for coverage? Shouldn't I combine these into a single track per sample?

sammyjava commented 2 years ago

Or are these bio reps with the same BioSample accession? (You're supposed to enter reps separately.) The experiment says "3 runs" so I think it's the same sample prep with three runs for coverage.

"sample","fastq_1","fastq_2","strandedness","accession","run_accession","experiment_accession","sample_accession","secondary_sample_accession","study_accession","secondary_study_accession","parent_study","submission_accession","run_alias","experiment_alias","sample_alias","study_alias","library_layout","library_selection","library_source","library_strategy","library_name","instrument_model","instrument_platform","base_count","read_count","tax_id","scientific_name","sample_title","experiment_title","study_title","description","sample_description","fastq_md5","fastq_bytes","fastq_ftp","fastq_galaxy","fastq_aspera"
"SRX891901","alfalfa_gene_index/fastq/SRR1820204.fastq.gz","","unstranded","SAMN03366849","SRR1820204","SRX891901","SAMN03366849","SRS858174","PRJNA276155","SRP055547","PRJNA13214","SRA244729","M.sativa ES1","M. sativa ssp sativa Leaf 3","B47_Elongating Stem","PRJNA276155","SINGLE","size fractionation","TRANSCRIPTOMIC","RNA-Seq","B47ES1","Illumina Genome Analyzer II","ILLUMINA","1235159980","16252105","3879","Medicago sativa","Plant sample from Medicago sativa subsp. sativa Elongating Stem","Illumina Genome Analyzer II sequencing; Medicago sativa Gene Atlas 1.2","Medicago sativa transcriptome and gene expression","Illumina Genome Analyzer II sequencing; Medicago sativa Gene Atlas 1.2","Plant sample from Medicago sativa subsp. sativa Elongating Stem","2162ff26110ed64b157dbda02653858b","737958309","ftp.sra.ebi.ac.uk/vol1/fastq/SRR182/004/SRR1820204/SRR1820204.fastq.gz","ftp.sra.ebi.ac.uk/vol1/fastq/SRR182/004/SRR1820204/SRR1820204.fastq.gz","fasp.sra.ebi.ac.uk:/vol1/fastq/SRR182/004/SRR1820204/SRR1820204.fastq.gz"
"SRX891901","alfalfa_gene_index/fastq/SRR1820227.fastq.gz","","unstranded","SAMN03366849","SRR1820227","SRX891901","SAMN03366849","SRS858174","PRJNA276155","SRP055547","PRJNA13214","SRA244729","M. sativa ssp sativa ES 2","M. sativa ssp sativa Leaf 3","B47_Elongating Stem","PRJNA276155","SINGLE","size fractionation","TRANSCRIPTOMIC","RNA-Seq","B47ES1","Illumina Genome Analyzer II","ILLUMINA","1291638468","16995243","3879","Medicago sativa","Plant sample from Medicago sativa subsp. sativa Elongating Stem","Illumina Genome Analyzer II sequencing; Medicago sativa Gene Atlas 1.2","Medicago sativa transcriptome and gene expression","Illumina Genome Analyzer II sequencing; Medicago sativa Gene Atlas 1.2","Plant sample from Medicago sativa subsp. sativa Elongating Stem","93c05e5f1cd84b3a90fa517a473f4b65","769224521","ftp.sra.ebi.ac.uk/vol1/fastq/SRR182/007/SRR1820227/SRR1820227.fastq.gz","ftp.sra.ebi.ac.uk/vol1/fastq/SRR182/007/SRR1820227/SRR1820227.fastq.gz","fasp.sra.ebi.ac.uk:/vol1/fastq/SRR182/007/SRR1820227/SRR1820227.fastq.gz"
"SRX891901","alfalfa_gene_index/fastq/SRR1820228.fastq.gz","","unstranded","SAMN03366849","SRR1820228","SRX891901","SAMN03366849","SRS858174","PRJNA276155","SRP055547","PRJNA13214","SRA244729","M. sativa ssp sativa ES 3","M. sativa ssp sativa Leaf 3","B47_Elongating Stem","PRJNA276155","SINGLE","size fractionation","TRANSCRIPTOMIC","RNA-Seq","B47ES1","Illumina Genome Analyzer II","ILLUMINA","1277531196","16809621","3879","Medicago sativa","Plant sample from Medicago sativa subsp. sativa Elongating Stem","Illumina Genome Analyzer II sequencing; Medicago sativa Gene Atlas 1.2","Medicago sativa transcriptome and gene expression","Illumina Genome Analyzer II sequencing; Medicago sativa Gene Atlas 1.2","Plant sample from Medicago sativa subsp. sativa Elongating Stem","81860a153667b5b4c4f0878ed274b234","757645051","ftp.sra.ebi.ac.uk/vol1/fastq/SRR182/008/SRR1820228/SRR1820228.fastq.gz","ftp.sra.ebi.ac.uk/vol1/fastq/SRR182/008/SRR1820228/SRR1820228.fastq.gz","fasp.sra.ebi.ac.uk:/vol1/fastq/SRR182/008/SRR1820228/SRR1820228.fastq.gz"
sammyjava commented 2 years ago

Ah never mind I see you merged them in the data dir, shoulda looked at that first.

sammyjava commented 2 years ago

This is in prod MedicMine now, currently doing the post-processing bits. And in the DS.

adf-ncgr commented 2 years ago

thanks, that was super fast!