nf-core / metapep

From metagenomes to epitopes and beyond
https://nf-co.re/metapep
MIT License
8 stars 5 forks source link

Allow for higher levels in taxid input #108

Open tillenglert opened 6 months ago

tillenglert commented 6 months ago

Description of feature

Currently only strain level is allowed/feasible to be used within taxid input.

Solving higher tax levels like species will come with problems:

More than one strain within species -> some may be pathogenic, some may not and depending on the application this is a crucial discrimination and one would try to solve between those.

Therefore, multiple strategies how to use higher level taxids should be available like largest, subset, refseq.

skrakau commented 6 months ago

The idea was to enable the input at species level. In any case, only one assembly should be used and not multiple (e.g. all strains for a species) since this would introduce a bias. One could think of using the associated genome for species level (not for strain level!), which is one representative strain (check definintion). And one would need to think how to best handle this, and properly document this so the user knows exactly what happens in the background.