snake-flu / type_variants

MIT License
8 stars 6 forks source link

Adjusting for the number of dashes in the aligned FASTA file #9

Open mmokrejs opened 8 months ago

mmokrejs commented 8 months ago

Hi, it probably does not cause an issue in your particular setup but would there be a sequencing error (insertion) in your sample read there you would want to skip the erroneous insertion. Or likewise, if there would be 6-nt long deletion in the sample but spanning (incompletely) three codons, don't you want to adjust slice window by the number of dashes and expand it?

https://github.com/snake-flu/type_variants/blob/9cbbaa585db81b6dda13f958253d602931c8e066/type_variants.py#L161

- query_allele = record.seq.upper()[var["ref_start"] - 1:var["ref_start"] + 2].translate(gap='-')
+ nucrange = var["ref_start"] - 1:var["ref_start"] + 2
+ query_allele = record.seq.upper()[var["ref_start"] - 1:var["ref_start"] + 2 + nucrange.count('-')].translate(gap='-')
mmokrejs commented 7 months ago

Likewise why is position 244 hardcoded, e.g. https://github.com/virus-evolution/gofasta/blob/f80b54427556e70c34ea7df471dbcb2a3b5760af/pkg/variants/pairwise.go#L25 and in other places in the same file.