sys-bio / libOmexMeta

libOmexMeta is a library aimed at providing developer-level support for reading, writing, editing and managing semantic annotations for biosimulation models.
https://sys-bio.github.io/libOmexMeta/
Apache License 2.0
8 stars 6 forks source link

pyomexmeta fails on 1% of SBML files from BioModels #113

Open jonrkarr opened 3 years ago

jonrkarr commented 3 years ago

pyomexmeta succeeded on 99% of BioModels! Below are the files that it failed on. We're working with the versions of the files below at https://github.com/sys-bio/temp-biomodels/tree/main/final. I have no further information about the failures because these files cause segmentation faults.

The generation of OMEX metadata files is incorporated into the temp-biomodels workflow and in the package we're creating for longer-term management of BioModels. We can re-run this when this library is revised.

CiaranWelsh commented 3 years ago

Okay, so libOmexMeta uses libxml2 to extract id's from things like species or compartment elements. The RDF::from* methods call SBMLSemanticExtraction::extractSpeciesCompartmentSemantics, which calls OmexMetaUtils::getXmlNodeProperty to extract the id property from the species node. However, it seems these models were built and annotated with CellDesigner which also uses the word "species" as an element name, but does not have a id property - hence the segfault.

However, importing BIOMD0000000226_url into Copasi provides the user with a warning that the model was built with an older version of CellDesigner and advises the user to load the model into the newest CellDesigner and re-export the sbml. I did this with BIOMD0000000226_url and found that firstly, reimporting the new sbml (L2V4) into copasi provides a new warning that a reaction rate law was not imported correctly and secondly that libOmexMeta could now properly parse the file and generate the expected rdf graph.

While its possible (and probably a good idea) to restrict the search for species properties to those that are in the sbml namespace, I suspect a better fix for this issue in the broader context of "validating biomodels project" is to follow this procedure for each of the broken sbml files and validate that they are valid sbml files before trying libOmexMeta again.

jonrkarr commented 3 years ago

Looping in @luciansmith since this now touches on improving BioModels.

(a) For the purpose of BioModels, we could use CellDesigner to reexport models as COPASI recommends. @luciansmith has done something similar with the latest version of COPASI. However, it sounds like this may create other problems. Assuming that's the case, I'm inclined to stick with current files. Since Lucian and I don't have the bandwidth to review changes to 1000+ models in detail, I think we need to be conservative with the changes we're recommending to BioModels. If the BioModels curators can help review changes, we could go further to trying to fix the SBML.

(b) For such cases where annotations can't be interpreted, could pyomexmeta generate an incomplete RDF representation of the annotations? Could pyomexmetadata report a warning that the annotations couldn't all be interpreted, rather than crashing with a segmentation fault? This would allow us to keep the current SBML (with valid rate laws) and generate (albeit incomplete) RDF.

CiaranWelsh commented 3 years ago

Yep - we can play this however you like. However, isn't it just the 11 models posted above that need reviewing reexported and then reviewed?

Since the bug is actually in the "Semantic Extraction" part of the code I suspect we could just detect errors, issue a warning to the user and then turn off automatic extraction of annotations from sbml. I'll get cracking with this and will post again if I run into problems.

CiaranWelsh commented 3 years ago

Added try catch blocks around methods in extract semantic annotations. Failures with logic_error now emit a warning to standard error and continues without extracting the rest of the semantics from the model.

e0080191242be1d86df4679343b13e90ec05962a (spec update branch, not yet integrated into develop or master). Keeping this issue open until I get validation that its working.

CiaranWelsh commented 3 years ago

FYI - I'm going to keep working on issues before making another release. However, if you need the newest pyomexmeta to get on with work you can download a pip wheel from here