rockt / SETH

SNP Extraction Tool for Human Variations
rockt.github.com/SETH
Other
27 stars 16 forks source link

Incorrectly extracted mutations #10

Open jhkbg opened 8 years ago

jhkbg commented 8 years ago

Some patterns return only partial matches against a longer mutation. Need to extend these patterns or create new, longer ones that have precedence over the shorter matches.

Examples:

  1. PMID=20806047 occurrence=p.X320SerextX72 extracted=p.X320Ser
  2. PMID=23903049 occurrence=p.His33GInfsX32 extracted=p.His33G
  3. PMID=22907560 occurrence=p.Arg313Hys extracted=p.Arg313H
  4. PMID=18486607 occurrence=p.Arg315Stop extracted=p.Arg315S
  5. PMID=23017188 occurrence=p.Phe508Del extracted=p.Phe508D
  6. PMID=24158885 occurrence=p.Met694IIe extracted=p.Met694I
  7. PMID=23856132 occurrence=p.F55>Lfs extracted=p.F55>L
  8. PMID=18708425 occurrence=p.L15_L16ins2L extracted=p.L15_L16ins2
Erechtheus commented 8 years ago

Thanks for the report. I added test-cases for the described errors here.

Some errors (3, 4, 5, 6) should be easy to fix. It seems that the parser stops too early in these cases. Other errors probably need some major adaption of the implemented Backus Naur grammar (e.g., 1,7,8). https://github.com/rockt/SETH/blob/master/src/test/java/de/hu/berlin/wbi/issues/Request10Test.java

jhkbg commented 8 years ago

Cool, thanks. I will look into this as well at some point.