vanheeringen-lab / seq2science

Automated and customizable preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and (sc)RNA-seq workflows. Works equally easy with public as local data.
https://vanheeringen-lab.github.io/seq2science
MIT License
149 stars 26 forks source link

Q: [Comparative RNA-Seq analysis] #1021

Open bioinfolabmu opened 7 months ago

bioinfolabmu commented 7 months ago

Question We are comparing different aligners and quantifiers to see their impacts on the same RNA-Seq raw data. Of course, it took long time to run one run. My guess is that we do not have to re-run the every step in the RNA-seq pipeline, but only re-run a few step that we want to compare.

For example, I want to use different qualtifier (Salmon versus htseq).

What have I tried I have finished running the pipeline and htseq, and I have not start the downstream differential analysis. Now, I want to modify the config.yaml file to use salmon as the quantifier. My question is that both quantifier results should be saved in my results folder. They are not conflicting to each other, right? Then, next step, is to bring back the differential analysis part of pipeline to finish the salmon-based DEG results and htseq-based results.

My questions is that all these results can be saved in the same "results" folder and let config.yaml to tell which one quantifier to use for DEG analysis. right. Or, I should run entirely pipeline again seperately?

Thank you for your attention and help.

Maarten-vd-Sande commented 7 months ago
  1. Does seq2science need to be fully rerun? No, seq2science can continue from the last possible point, as to save compute. There is a minor "problem", in that seq2science deletes some files to not save too much unnecessary stuff (called temp files). For your case, seq2science removes the trimmed fastqs after it is done with them, because otherwise it will keep both the raw fastqs as the trimmed ones. That's a waste of space! You can turn off the removal of temp files with --snakemakeOptions notemp=True. Make sure to check if this works as expected with --dryrun, because might just delete some files you wanted to keep after all...
  2. Are the results stored in the same spot? Yes and no... The results of the quantifier are stored in the folder specific for the quantifier, so they won't overlap. The downstream results, for instance, the differential analysis, is stored in the same deseq folder. So those will be overwritten

So it is possible indeed. If you don't have too many samples then I think it is the easiest and least error prone to just run them in separate folders. However if you have a lot of samples, or are limited by compute/storage/time then you can reuse some of the seq2science output.

siebrenf commented 7 months ago

Adding to Maarten's asnwer: you can change the counts_dir and/or final_bam_dir in the config. That way, the final output is kept separately. Check out all configurable options.