Open jonrkarr opened 3 years ago
Okay, so libOmexMeta uses libxml2 to extract id's from things like species
or compartment
elements. The RDF::from*
methods call SBMLSemanticExtraction::extractSpeciesCompartmentSemantics, which calls OmexMetaUtils::getXmlNodeProperty to extract the id
property from the species
node. However, it seems these models were built and annotated with CellDesigner which also uses the word "species" as an element name, but does not have a id
property - hence the segfault.
However, importing BIOMD0000000226_url
into Copasi provides the user with a warning that the model was built with an older version of CellDesigner and advises the user to load the model into the newest CellDesigner and re-export the sbml. I did this with BIOMD0000000226_url
and found that firstly, reimporting the new sbml (L2V4) into copasi provides a new warning that a reaction rate law was not imported correctly and secondly that libOmexMeta could now properly parse the file and generate the expected rdf graph.
While its possible (and probably a good idea) to restrict the search for species
properties to those that are in the sbml
namespace, I suspect a better fix for this issue in the broader context of "validating biomodels project" is to follow this procedure for each of the broken sbml files and validate that they are valid sbml files before trying libOmexMeta again.
Looping in @luciansmith since this now touches on improving BioModels.
(a) For the purpose of BioModels, we could use CellDesigner to reexport models as COPASI recommends. @luciansmith has done something similar with the latest version of COPASI. However, it sounds like this may create other problems. Assuming that's the case, I'm inclined to stick with current files. Since Lucian and I don't have the bandwidth to review changes to 1000+ models in detail, I think we need to be conservative with the changes we're recommending to BioModels. If the BioModels curators can help review changes, we could go further to trying to fix the SBML.
(b) For such cases where annotations can't be interpreted, could pyomexmeta generate an incomplete RDF representation of the annotations? Could pyomexmetadata report a warning that the annotations couldn't all be interpreted, rather than crashing with a segmentation fault? This would allow us to keep the current SBML (with valid rate laws) and generate (albeit incomplete) RDF.
Yep - we can play this however you like. However, isn't it just the 11 models posted above that need reviewing reexported and then reviewed?
Since the bug is actually in the "Semantic Extraction" part of the code I suspect we could just detect errors, issue a warning to the user and then turn off automatic extraction of annotations from sbml. I'll get cracking with this and will post again if I run into problems.
Added try catch blocks around methods in extract semantic annotations. Failures with logic_error now emit a warning to standard error and continues without extracting the rest of the semantics from the model.
e0080191242be1d86df4679343b13e90ec05962a (spec update branch, not yet integrated into develop or master). Keeping this issue open until I get validation that its working.
FYI - I'm going to keep working on issues before making another release. However, if you need the newest pyomexmeta to get on with work you can download a pip wheel from here
pyomexmeta succeeded on 99% of BioModels! Below are the files that it failed on. We're working with the versions of the files below at https://github.com/sys-bio/temp-biomodels/tree/main/final. I have no further information about the failures because these files cause segmentation faults.
BIOMD0000000094/BIOMD0000000094_url.xml
BIOMD0000000192/BIOMD0000000192_url.xml
BIOMD0000000220/BIOMD0000000220_url.xml
BIOMD0000000226/BIOMD0000000226_url.xml
BIOMD0000000227/BIOMD0000000227_url.xml
BIOMD0000000394/BIOMD0000000394_url.xml
BIOMD0000000395/BIOMD0000000395_url.xml
BIOMD0000000396/BIOMD0000000396_url.xml
BIOMD0000000397/BIOMD0000000397_url.xml
BIOMD0000000398/BIOMD0000000398_url.xml
BIOMD0000000436/BIOMD0000000436_url.xml
The generation of OMEX metadata files is incorporated into the
temp-biomodels
workflow and in the package we're creating for longer-term management of BioModels. We can re-run this when this library is revised.