pombase / allele_qc

Quality control for PomBase alleles
MIT License
1 stars 1 forks source link

Address "ghost positions" for the long term #63

Open manulera opened 1 year ago

manulera commented 1 year ago

Now I am using a ? symbol to mark those residues, but I should probably think of something better.

Related to #62

manulera commented 1 year ago

Hi @kimrutherford, I guess you have implemented something in your script that ignores the ? positions in the script that reads fixes to modifications (see comment in https://github.com/pombase/allele_qc/issues/66#issuecomment-1673419593).

This can also happen in allele descriptions, for instance, you can try SPBC25B2.07c S4A in the old_coords_fix endpoint of the API. S4 existed in an old genome, but it maps to nothing in the current sequence, so the response would be:

[
  {
    "values": "?",
    "revision": "1381",
    "location": "complement(2609038..2610591)"
  }
]

I suspect this would be very rare, since once you go into such detailed modifications you probably have agood quality aminoacid sequence, but if an intron is added and affects a partial deletion, this could happen.

TLDR: if change_description_to contains a ?, don't let the change go through.