Closed cthoyt closed 1 year ago
Hmm.. I think this is outside of the scope of ROBOT.. If you want this to happen you have to go through https://github.com/owlcs/owlapi/issues/ or join the #obo-format
channel on OBO slack where @balhoff is currently thinking about prefix maps for OBO format and other fixes - he may be amenable to this. But a ROBOT issue per se this is not I don't think - if the raw data is broken, the tool cant be expected to deal with all eventualities, so I would simple run a grep -v
on the OBO file prior to parsing. If you agree, can you close the issue?
This exact issue is a problem with the currently released ChEBI OBO file: https://github.com/ebi-chebi/ChEBI/issues/4273
Rethinking this now: I could implement a "repair --obo-format" option that deals with the most frequent violations like multiple labels and multiple comments etc.. I would be open to this but it would have to be now!
Sorry, I now realise I discuss this here: https://github.com/ontodev/robot/issues/995 and that this (broken rows) is not possible at all right now without a major OWLAPI update.
This needs to be either added as an OWL API ticket, or oboformat.. https://github.com/owlcollab/oboformat/issues
I will close this now, as what ROBOT can do about this can be covered by #995
The Cellosaurus ontology contains many invalid lines, e.g. the following line has improperly escaped curly braces in the molecule's name:
If you run
robot convert -I https://ftp.expasy.org/databases/cellosaurus/cellosaurus.obo -o ~/Desktop/cellosaurus.json -vvv
and look very carefully for the relevant error (for now, you have to search the output fororg.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser
- #1038 would be helpful for this), you find that:This ontology doesn't do its curation in an open source way so it's difficult to communicate and help solve this issue. Further, I downloaded the file and started making fixes one at a time, but I have to re-run
robot convert
on every step. It would be nice if there were a setting that allowed for invalid lines to be skipped on OBO parsing.CC @AmosBairoch @lubianat
Update: this is the same underlying issue as https://github.com/ebi-chebi/ChEBI/issues/4273