stajichlab / RNASeq_template

Template for RNASeq analyses running on HPCC
MIT License
0 stars 4 forks source link

Sample RNA-Seq data to run pipeline #1

Open andresn opened 3 years ago

andresn commented 3 years ago

Hi @hyphaltip,

Can you point me to your favorite RNA-Seq data to run against your test species (Coccidioides immitis) in the pipeline?: https://github.com/biodataprog/RNASeq_template/blob/ecabfc1dd01ac2b5c370651b7d6f7aef28f35dfa/pipeline/00_download.sh#L6

I'm trying to run a clean version of your pipeline to compare the results of the modified version of the template I'm working with for debugging purposes.

I looked on FungiDB (only found full genome studies and couldn't find how to download them), NCBI (only found forward reads), your bigdata dir (with $: find ...; but it's big!).

Thank you in advance!

hyphaltip commented 3 years ago

https://github.com/biodataprog/RNASeq_template/blob/main/samples.csv has two SRR numbers in it which we use. You can see the 3 bioreps across 2 conditions associated with this SRA Project https://www.ncbi.nlm.nih.gov/sra?term=SRP013923 or here with directly download links / IDs for the SRR runs for downloading with fastq-dump https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP013923

andresn commented 3 years ago

Great, thank you @hyphaltip!

andresn commented 3 years ago

@hyphaltip, I reopened confused about the right SRR numbers, and then realized it looks like your downstream RNASeq_template needs to be updated with the SRR numbers you have in this repo: https://github.com/biodataprog/RNASeq_template/blob/main/samples.csv

Update 1: This repo's samples.csv for your convenience: https://github.com/stajichlab/RNASeq_template/blob/main/samples.csv

Update 2: Or maybe ^ is what you meant to point to instead in https://github.com/stajichlab/RNASeq_template/issues/1#issuecomment-759822418 since what you're going for is a template... closing again 😅, but it might be good to link to the working example SRR's in the template's README?

Update 3 Okay, yeah, it does look like I'll need help finding some valid SSR numbers (instead of SRR1234, SRR1235) that go inline with your concrete example in your template here: https://github.com/stajichlab/RNASeq_template/blob/main/pipeline/00_download.sh#L6

Update 4 ... and SSR numbers that have both forward and backward reads since that's what Kalisto is expecting here: https://github.com/biodataprog/RNASeq_template/blob/main/pipeline/01_kallisto.sh#L40 ... and I'm not finding them associated with the SRP you provided: https://www.ncbi.nlm.nih.gov/sra?term=SRP013923

@hyphaltip apologies for the notification blast, do you use Slack by the way?

hyphaltip commented 3 years ago

I never intended the template to have actual real numbers, you have the correct ones in the issue above. https://www.ncbi.nlm.nih.gov/sra?term=SRP013923

I'm glad you are working from this but I would use the template and then enter correct numbers based on the links.

andresn commented 3 years ago

@hyphaltip got it. Your thoughts on this?:

Update 4 ... [can't find via link https://www.ncbi.nlm.nih.gov/sra?term=SRP013923] SSR numbers that have both forward and backward reads since that's what Kalisto is expecting here: https://github.com/biodataprog/RNASeq_template/blob/main/pipeline/01_kallisto.sh#L40 ... and I'm not finding them associated with the SRP you provided: https://www.ncbi.nlm.nih.gov/sra?term=SRP013923

Also, I was able to debug and produce our own implementation's heat map, thank you!

image
hyphaltip commented 3 years ago

Here's another link? There are 6 SRR numbers in the table and their status as reps and conditions are listed.

https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP013923&o=acc_s%3Aa ?

If you have your own anyways I would focus there too anyways. This is just one sliver of analysis in terms of heat maps but I hope you can build from there and read the DESeq2 manual too? https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html