Closed RaverJay closed 2 years ago
Uff, yeah I see. That the Wuhan reference is hardocded for now might be okay: bc/ this is anyway used by everyone to have comparable results.
I agree, that we might have strange outlier sequences w/ many or long insertions and then this fucks up the report a bit.
We should test this (e.g. I can also run some real current data sets here).
And then we could switch to the Nextclade aa insertion reporting later, but I also agree that's unclear when this will be addressed in implemented.
So my vote is to have this in (especially bc/ of Omicron) and then do some testing. If the reports look fine, we merge and might later replace this by the Nextclade implementation.
I moved this out of the report process to a module+script, should be clean and working now And if Nextclade reports it in the future, this can easily be changed back
Please test =)
Nice, looks good!
One thing, we could also link to the "hijacked" github repo for the nt2aa translation here:
Note: amino acid insertions are currently not reported directly by Nextclade, and were instead converted from nucleotide insertions with custom code when possible.
? Besides, I would then also merge that and do a prerelase. I also run this w/ ~20 Delta sequences and then just no insertion was reported (which is correct)
and btw, we could also use SNPeff actually, there is a specific version for SARS-CoV-2 where you can pass the final VCF file and then it should give you the amino acid translations, e.g. see here
But, if this works atm we can also go with the usual system via Nextclade.
added
great thx @RaverJay ! So I would merge this then, and SNPeff we can keep in mind but this would introduce a larger change then
Just to see how painful it would be, this adds custom code to convert Nextclade's nt insertion call to aa insertions. Implements #82
Some things are very experimental:
So do you think we should include this? Then I would clean this up a little, then we test and merge
Obviously it would be much better if Nextclade implemented this in their output