virus-evolution / gofasta

MIT License
34 stars 1 forks source link

Accept --offset argument to adjust the nuc/aa positions #46

Open mmokrejs opened 9 months ago

mmokrejs commented 9 months ago

Hi, I wonder why you did not need to adjust the numbering of reported changes. I just want to add an arbitrary number to the nuc positions before the output is created.

To get around it, I padded the reference sequence with leading and trailing Ns to get the nuc and aa positions reflecting their natural location in mRNA. The corresponding .gff file also had to be tweaked to define the CDS region since position 1.

Then gofasta failed with:

Error: Translation error: Untranslatable codon: NNN

So I had to rewrite Ns with as to get around.

Now my nuc and aa positions reported by ./gofasta variants are as expected, K417 instead of K91, etc.

Basically, instead of prepending 978 as I would have used --offset 978. Should be simple to implement.

benjamincjackson commented 8 months ago

I'm not sure I understand this comment. Again it would be helpful if you could provide attachments so that I could replicate the error.

In general, gofasta assumes that the alignment is in the same space as the reference sequence which is in the same space as the annotation.

If the alignment isn't in the same space as the annotation, the user is free to change the annotation (or the alignment) themselves. I don't want this to happen internally to gofasta because it's an unnecessary complication.

Let me know if I haven't understood your issue correctly.

mmokrejs commented 8 months ago

Regarding the testcases, see https://github.com/virus-evolution/gofasta/issues/44#issuecomment-1936261317

Basically I don't have the reference CDS starting at position 1 but elsewhere. The data is from amplicon sequencing with a product spanning just a portion of the CDS region. So receiving K91 in the output is hard to interpret and basically I need to correct the position by a fixed offset. Nucleotide positions need to be adjusted by 978, aa positions by 978 / 3.0 = 326.

gofasta can adjust the output values just before printing on STDOUT. Otherwise one needs to parse the output and addjust by a different value nuc positions and by a different value aa positions.

If one decides to prepend Ns into the reference sequence to cover the region leftwards from the amplified region, gofasta dies with Error: Translation error: Untranslatable codon: NNN. At least if that was just a Warning ....