mjy / obo_parser

An OBO file parser.
MIT License
6 stars 6 forks source link

Error parsing XREFS #2

Open fmjabato opened 4 years ago

fmjabato commented 4 years ago

I'm using your gem to parse Human Phenotype Ontology, exactly this version (https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.obo).

When I try to load it using your example code I obtain a "Runtime Exception: Facebase is seemingly infinite".

Searching into your code, I observe that it happend at line 69 of your "Tokens.rb" file. It seems to be a conceptual error because there, you are parsing XREFS and the line which launch the error using hp.obo file is a "synonym" of term "HP:0000175".

Tell me if you need any other information to replicate the error

mjy commented 4 years ago

Hi @fmjabato, thanks for this. Before I look at this can you do a quick check in OBO Edit to see that there are no major errrors detected there. I'm not positive but you might have a circular synonymy in there? Again, I haven't tried to replicate.

fmjabato commented 4 years ago

Hi, I'm not performed this check you ask about but HPO is an ontology which I have used several times and doesn't show circular behaviour and load perfectly in homologous packages for other languages (in R and Java).

mjy commented 4 years ago

@fmjabato Those packages are not homologous in the sense that code from there became the basis for this package, which is far less sophisticated.

I loaded the file in OBO Edit as I requested of you and ran the Verification- there are many (many) minor "issues" (not important to other parsers, but possibly for this one) with the data that could conceivably be the problem. If you can narrow down the problem by editing the OBO file down cleaning up the dbxrefs in particular I might have a fighting chance to debug it (somewhat) sooner.

One way to figure out the exact problem is to bisect the file (save with half cut out, but include properties at the end), test the load, repeat.

Thyra commented 4 years ago

@mjy Hey, the parser appears to be have problem with square brackets, I can reproduce the same error message with this minimal excerpt from the Gene Ontology:

format-version: 1.2
data-version: releases/2020-06-01
ontology: go

[Term]
id: GO:0000277
name: [cytochrome c]-lysine N-methyltransferase activity
namespace: molecular_function
xref: EC:2.1.1.59
xref: MetaCyc:2.1.1.59-RXN

When you take away the brackets, it parses just fine. I guess it interprets "cytochrome c" as a reference to something because of the brackets which it then can't find?