openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
70 stars 21 forks source link

The loss of methylation syntax returns unexpected error. #206

Open ifokkema opened 4 years ago

ifokkema commented 4 years ago

According to the nomenclature pages, the pipe is used for:

(...) indicate that not a direct change of the sequence is described but a modification (a change of state, e.g. methylation). (see Example methylation)

This example shows:

“gom” indicates a gain of methylation; g.12345678_12345901|gom “lom” indicates a loss of methylation; g.12345678_12345901|lom “met” indicates a methylation; g.12345678_12345901|met=

Before I'm pretty sure I have seen syntax errors with these variants, but now it seems the pipe is translated into a separator of variants. Submitting NC_000011.9:g.2018812_2024740|lom returns some (for me) unexpected results:

{
  "NC_000011.9:g.2018812_2024740": {
    "NC_000011.9:g.2018812_2024740": {
      "genomic_variant_error": "NC_000011.9:g.2018812_2024740: char 30: end of input"
    },
    "errors": [],
    "flag": "genomic_variant_warning"
  },
  "lom": {
    "errors": [],
    "flag": "submission_warning",
    "lom": {
      "genomic_variant_error": "Variant description lom is not in a supported format"
    }
  }
}

The variant should not be split in two; if the pipe is not supported, it should throw a syntax warning.

Peter-J-Freeman commented 4 years ago

We do use Pipe internally as a separator. Forgot about its use here. Seriously, can this nomenclature not be simplifed. It's getting crazy!!

Haha

ifokkema commented 4 years ago

I wouldn't mind a simplification, but it looks like new situations seem to cause new characters being included in the description :roll_eyes:

Peter-J-Freeman commented 4 years ago

It's something that needs to be addressed. It is making computation more and more difficult!

ifokkema commented 4 years ago

This still applies.

Peter-J-Freeman commented 2 years ago

Yes it does. It's currently pretty low priority though. I'm sick of the HGVS using every available ascii character needlessly. There is no need for the | in the gom lom desccriptions

ifokkema commented 2 years ago

OK, we can then fix this by writing a wrapper function that sends the variant as an = and then converts the reply back to |gom when needed.

Peter-J-Freeman commented 2 years ago

Think we could do the same to be honest. Needs plugging in though. Can you please raise it in the LOVD alignment issue mate? Want to capture all these issues in one place. This is an easy one for the students to complete

Peter-J-Freeman commented 2 years ago

Actually, I will add it.

ifokkema commented 2 years ago

Alright! Great! I just expected it to be an issue as it conflicts with your batch feature. So it seems either the batch feature works or these variants would be supported. Hence my thought was to just "trick" VV with a different variant and keep it LOVD-side.

Peter-J-Freeman commented 2 years ago

Nope, that's the point of this issue and exercise. I have students writing code to start handling these variants. Better you help them rather than make needless fudges. Keep your eye on issues