Terminal Ns not recognized as missing

While investigating https://github.com/cov-lineages/pango-designation/issues/590, I noticed that samples with the BA.2 S2M deletion (29734:29759) were being incorrectly visualized as having reference bases in sc2rf:

Consensus View:

sc2rf View:

I think this could be for a couple of reasons:

When --enable-deletions is used, perhaps deletions should not be considered missing data?

missings_matches = ["N"]
if not args.enable_deletions:
    missings_matches.append("-")

I think there is missing logic when detecting a run of Ns, to catch if that runs proceeds to the end of the genome?

if s in missings_matches:
    # we've been tracking a run of N's, this base marks the end              
    if start_n == -1:
        start_n = i  # mark the start of possible run of N's
elif start_n >= 0:
    missings.append((start_n, i-1))  # Python-style (closed, open) interval
    start_n = -1

# Missing logic to catch missing data at the end of the genome?
if i == len(reference) and s in missings_matches:
    missings.append((start_n, i-1))

With these changes, the sc2rf output more closely matches the consensus sequence/my expectation:

I think this is a bug, but if it's the intended behaviour for deletions, please let me know. Thanks!

lenaschimmel / sc2rf

Terminal Ns not recognized as missing #29