Open pinin4fjords opened 9 months ago
So is it just rRNA that is removed by default? I am not clear on what the combination of bbsplit
and sortmerna
achieve so it is hard to know what kinds of contaminants you have in mind (tRNA, phiX?).
I came to the conclusion that a blanket cross-species set was not practical.
For test_full I used the usual rRNA complement with human tRNA sequences added (https://github.com/nf-core/test-datasets/blob/riboseq/testdata/rrna-db-full.txt), but this will be down to the user I think- so maybe this is a documentation issue.
Description of feature
sortmerna is implemented in the pipeline and runs by default. There will also be a bunch of other short RNA species we should remove, which we can use the (also inherited) bbsplit functionality.
But we do need to derive a list of contaminant sequences and figure out where to store it.