naobservatory / mgs-workflow

MIT License
4 stars 2 forks source link

Refine treatment of partially-HV taxa #35

Open willbradshaw opened 4 months ago

willbradshaw commented 4 months ago

Investigating the v2 pipeline’s human-virus assignment behavior:

As such, the script checks whether the assigned taxid or any of its ancestors are HV taxa, but not whether any of its descendents are. Reads that are assigned to higher-level taxa will thus be treated as though they were assigned to a non-HV taxa, and filtered out during HV read identification.

This seems suboptimal. That said, it’s not obvious what the correct behavior is here; treating reads assigned to partially-HV taxa as HV comes with its own problems. Someone should think more about what the right approach is here.

willbradshaw commented 1 month ago

Given our Q4 OKRs, this is likely to instead refer to vertebrate-infecting or endotherm-infecting viruses, but the problem remains: how to handle higher-level assignments for which a subset of descendents are included and another is not.