error writing to sbml without prefixing identifiers

Karrenbelt commented 5 years ago

Trying to prevent cobra from prefixing my identifiers when writing SBML, I got this (using the textbook model, model history error is unrelated):

cobra.io.write_sbml_model(model, 'test.xml', f_replace={})

Error encountered trying to <set model history>.
LibSBML error code -5: The object passed as an argument to the method is not of a type that is valid for the operation or kind of object involved. For example, handing an invalidly-constructed ASTNode to a method expecting an ASTNode will result in this error.
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/venv/lib/python3.6/site-packages/cobra/io/sbml.py", line 831, in write_sbml_model
    doc = _model_to_sbml(cobra_model, f_replace=f_replace, **kwargs)
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/venv/lib/python3.6/site-packages/cobra/io/sbml.py", line 1059, in _model_to_sbml
    gpa.setAssociation(gpr_new)
UnboundLocalError: local variable 'gpr_new' referenced before assignment

replacing gpr_new with gpr in these lines should fix this

matthiaskoenig commented 5 years ago

I basically fixed this yesterday in the devel branch as part of other bugfixes ;). Please try the devel version. Do in your virtualenv

(cobra) git clone https://github.com/opencobra/cobrapy.git
(cobra) cd cobrapy
(cobra) git checkout devel
(cobra) pip install -e .

This will be resolved with the next release.

Karrenbelt commented 5 years ago

it writes, but reading back it shows to be erroneous

cobra.io.read_sbml_model('test.xml')

Traceback (most recent call last):
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 210, in read_sbml_model
    **kwargs)
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 381, in _sbml_to_model
    sid = _check_required(specie, specie.getIdAttribute(), "id")
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 1194, in _check_required
    raise CobraSBMLError(msg)
cobra.io.sbml.CobraSBMLError: Required attribute 'id' cannot be found or parsed in '<Species "3-Phospho-D-glyceroyl phosphate">'. with name '3-Phospho-D-glyceroyl phosphate'
None
Traceback (most recent call last):
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 210, in read_sbml_model
    **kwargs)
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 381, in _sbml_to_model
    sid = _check_required(specie, specie.getIdAttribute(), "id")
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 1194, in _check_required
    raise CobraSBMLError(msg)
cobra.io.sbml.CobraSBMLError: Required attribute 'id' cannot be found or parsed in '<Species "3-Phospho-D-glyceroyl phosphate">'. with name '3-Phospho-D-glyceroyl phosphate'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/Zarathustra/ETH/repositories/sbml_tools/cobrapy/cobra/io/sbml.py", line 217, in read_sbml_model
    "Something went wrong reading the SBML model. Most likely the SBML"
cobra.io.sbml.CobraSBMLError: Something went wrong reading the SBML model. Most likely the SBML model is not valid. Please check that your model is valid using the `cobra.io.sbml.validate_sbml_model` function or via the online validator at http://sbml.org/validator .
    `(model, errors) = validate_sbml_model(filename)`
If the model is valid and cannot be read please open an issue at https://github.com/opencobra/cobrapy/issues .

same when using libsbml to read

doc = libsbml.readSBMLFromFile('test.xml')
doc.printErrors() # 16 in total

Midnighter commented 5 years ago

I think this is a different error and a duplicate of #879. Basically, the model you were using doesn't have correct SBML identifiers and those are dropped upon writing the model thus the model is broken on import.

cdiener commented 5 years ago

Yeah, most common pitfall is that you have identifiers that start with numbers (common for metabolites) and this is not allowed in SBML (and is why we add a to IDs).

matthiaskoenig commented 5 years ago

Not sure what to do here. There are clear error messages on writing the SBML that the identifiers are incorrect. We agreed on only reading valid SBML. Only solution here is to change the invalid identifiers if you want to write them out.

We could be more drastic and just not create a SBML file at all on write errors like identifiers. Or restrict the allowed identifiers within cobrapy (nobody will like that most of all the bigg database).

Not sure what the issue is here. Please let me know if there is anything to fix.

cdiener commented 5 years ago

I would say there is a slightly incorrect behavior in writing the initial SBML. It seems to have invalid identifiers but no error is given. Writing a SBML model with invalid identifiers should nope out and make it clear that this is not right.

Midnighter commented 5 years ago

I agree, invalid identifiers is definitely an error that should not only be a logged error level message but actually raise an exception and not write the file.

gregmedlock commented 5 years ago

Maybe we could steer users in the right direction with a function that checks SBML compliance prior to writing, so that what we end up putting in documentation is more like:

# check SBML compliance
cobra.io.check_sbml_model(model) # prints any compliance issues

# write the compliant SBML model
cobra.io.write_sbml_model(model)

check_sbml_model could even return a "validated model" that is the only acceptable input to write_sbml_model if we wanted to be really strict... check_sbml_model could also have output that reads more like a report than an error, telling the user what they need to fix to be compliant.

I think only writing fully valid SBML files, plus adding a checking function of some sort that we recommend use of prior to writing, is the way to go. For cases like BIGG, IMO identifiers should be changed on a case-by-case basis for SBML compliance and/or shared as json.

matthiaskoenig commented 5 years ago

My personal take on this is to just make the used ids SBML compliant, i.e., if somebody tries to set an id on a cobra model which starts with a number the user should get a warning/error and the replacement should be performed right there and then.

Having invalid SBML ids internally in cobra creates many issues, e.g.,

things like tab completion are not working (https://github.com/opencobra/cobrapy/issues/828)
mapping data on networks becomes cumbersome, because one has to keep track of two set of ids (one internally in cobrapy and working with python, a second set of ids as soon as using the SBML model in a different tool; this hampers interoperability; makes it complicated to do simple things like hashmap lookups, because one has to perform possible id replacements everywhere and so on)
SBML export becomes slower and much more complicated (because one has to keep track of id replacements all the way through the model)
default SBML export behavior becomes strange (why are ids replaced by default)? I look for my cobrapy id in my SBML model but cannot find it, because strange replacements were made.

I would prefer the following behavior:

enforce (or highly recommend the use of) SBML compliant ids internally in cobrapy
add a convenience function allowing to make all ids SBML compliant
SBML export with invalid SBML ids does raise an error
id replacement is not the norm on SBML export any more (but can be enabled if people want this, for instance if some tools expect M, R or G_ prefixes)

cdiener commented 5 years ago

@gregmedlock I think this woul be great but may be hard to implement given the sheer size of the SBML spec. The only way that seems simple would be to write the broken model and use a validator on the written SBML file (maybe @matthiaskoenig) knows if there is functionality for that in libSBML (something like the online validator).

@matthiaskoenig That sounds good. I think due to backwards compatibility we can't enforce those IDs but we could nudge users to use good ones (even though a warning may be too strong since non-SBML compliant IDs are valid in other export formats such as JSON and YAML). The rest sounds great too me (error on write and optional replacement).

gregmedlock commented 5 years ago

@cdiener agreed--I should have been more specific. The proposed function would only check compliance for SBML aspects that are already checked when writing the file via libSBML (e.g., IDs).

The idea is that users should be free to modify IDs for their own use, but should hit a hard roadblock when they attempt to write SBML without making the IDs compliant. Of course we should encourage compliance from the beginning, but I think throwing errors when a user sets a non-compliant ID is going to scare people away from the package. In my experience, most people struggle to understand the dire need for SBML when they first start using COBRA tools. For that reason, I favor warnings over errors when non-compliant IDs are set (and when a model containing non-compliant IDs is read). E.g., the suggestions by @matthiaskoenig with strong preference for warnings over errors when IDs are first set/read.

cdiener commented 5 years ago

@gregmedlock Very much agreed. To clarify I would only throw an error if the user enforces no prefixes and has invalid SBML IDs. The default would still be to add prefixes if the IDs are not compliant and just show a warning.

akaviaLab commented 5 years ago

I ran across this problem or a similar one. What about implementing something similar to https://github.com/opencobra/cobratoolbox/blob/develop/src/base/io/utilities/SBML/convertSBMLID.m This is needed for various reasons, but since both read and write SBML call it, the user can't even tell that anything is happening. If this is okay with you guys, I can take a stab at it.

matthiaskoenig commented 5 years ago

@akaviaLab This is happening right now. Code is part of the current behavior and is already implemented.

We are only talking about the very special use case in this issue of "someone has invalid identifiers but does not want the id replacements on writing SBML. What should we do in this case?"

matthiaskoenig commented 5 years ago

@akaviaLab In addition, ID replacement is a nightmare for interoperability between tools. Whenever possible tools should highly encourage to use valid ids. It becomes very difficult to map datasets onto networks if the ids change between formats. E.g. I run a flux simulation in cobra and want to visualize this with the SBML network. The ids of the flux simulation do not match the ids of the SBML exchange format any more.

Karrenbelt commented 5 years ago

naming conventions are more often a pain than they ought to be. I agree with matthiaskoenig and cdiener, it is best to only throw an error in the special case someone does not want identifier replacement and thereby would create an invalid SBML model.

akaviaLab commented 5 years ago

@matthiaskoenig - the code does not replace the GIDs when reading old-style SBML files that have the gene association in NOTES field, like Recon 2.2. See example file exampleModel.xml.gz, where if you read SBML and try to write SBML, you get an error because the gene IDs have ":". I think the SBML needs a function like the MATLAB Cobratoolbox I linked to. It might work with GPRcleaner, but it would need to be extended to colons - I also think it is a good idea that the replacement is the same in Cobrapy and cobatoolbox. If you'd like, I can open a different bug for this issue.

Midnighter commented 5 years ago

If someone feels like mediating, coordinating with the Matlab folks to standardize on how identifiers are transformed could be a useful thing?

Other than that, I take from this issue that we need to:

Do transformations for GPR read from notes.
Raise an error upon writing invalid SBML.
Anything else I missed now?

opencobra / cobrapy

error writing to sbml without prefixing identifiers #883