implemented seq ident consensus check

hoelzer commented 3 years ago

I basically implemented a consensus quality check via https://gitlab.com/RKIBioinformaticsPipelines/president

The tool itself is still under active development and soonish we need to update the docker file but basically, it's working.

Now, for every calculated consensus sequence, a comparison to (per default) Wuhan strain is running reporting:

ID  Valid   Identity    Ambiguous Identity  Ambiguous Bases Query Length
FAO96286_barcode07/ARTIC/medaka MN908947.3  True    0.9953  0.9994  123.0   29902

(example of good quality)

or

ID  Valid   Identity    Ambiguous Identity  Ambiguous Bases Query Length
FAO96286_barcode67/ARTIC/medaka MN908947.3  False   0.4063  0.9991  17742.0 29903

(example of low quality; a lot of mismatches due to Ns)

This basically fixes #13 for now and might need some adjustments in production (and when there are changes to the tool itself).

hoelzer commented 3 years ago

@replikation I will restructure the code a bit, please don't merge now

replikation commented 3 years ago

@hoelzer ok. Change so far are good. A simple Test Profile nicht be good now with added changes at some point. I dont merge

hoelzer commented 3 years ago

okay done @replikation - I just saw that you do the coverage plot qc directly in the artic sub-workflow and added the seq ident process also at that place.

Not sure, we might have some clash with the other PR #15 we need to solve then, but I can do this if you are fine with both PRs :)

replikation / poreCov

implemented seq ident consensus check #16