nf-core / riboseq

Pipeline for the analysis of ribosome profiling, or Ribo-seq (also named ribosome footprinting) data.
https://nf-co.re/riboseq
MIT License
11 stars 7 forks source link

Generate contaminants file #28

Open pinin4fjords opened 9 months ago

pinin4fjords commented 9 months ago

Description of feature

sortmerna is implemented in the pipeline and runs by default. There will also be a bunch of other short RNA species we should remove, which we can use the (also inherited) bbsplit functionality.

But we do need to derive a list of contaminant sequences and figure out where to store it.

JackCurragh commented 7 months ago

So is it just rRNA that is removed by default? I am not clear on what the combination of bbsplit and sortmerna achieve so it is hard to know what kinds of contaminants you have in mind (tRNA, phiX?).

pinin4fjords commented 7 months ago

I came to the conclusion that a blanket cross-species set was not practical.

For test_full I used the usual rRNA complement with human tRNA sequences added (https://github.com/nf-core/test-datasets/blob/riboseq/testdata/rrna-db-full.txt), but this will be down to the user I think- so maybe this is a documentation issue.