opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
460 stars 216 forks source link

JSON format does not support -inf/inf bounds #856

Open matthiaskoenig opened 5 years ago

matthiaskoenig commented 5 years ago

Trying to convert models with -inf/inf bounds to json fails with

Traceback (most recent call last):
  File "/home/mkoenig/git/pancreas_model/pypancreas/pancreas/model_factory.py", line 51, in <module>
    cobra.io.save_json_model(model, MODEL_JSON)
  File "/home/mkoenig/envs/pancreas/lib/python3.6/site-packages/cobra/io/json.py", line 110, in save_json_model
    json.dump(obj, file_handle, **dump_opts)
  File "/usr/lib/python3.6/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.6/json/encoder.py", line 430, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.6/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/usr/lib/python3.6/json/encoder.py", line 396, in _iterencode_dict
    yield _floatstr(value)
  File "/usr/lib/python3.6/json/encoder.py", line 241, in floatstr
    repr(o))
ValueError: Out of range float values are not JSON compliant: -inf

-inf/inf values are not JSON serializable, consequently such models can currently not correctly be represented in JSON.

cdiener commented 5 years ago

The JSON spec does not allow infinity values so those should be converted to null.

matthiaskoenig commented 5 years ago

Sounds like a good solution. Just have to make sure that the JSON importer and exporter are handling the conversions. There could still exist some strange corner cases were we loose information, e.g. somebody decided to have bounds of (-INF, -INF) or (INF, INF) (not sure why, but is a valid LP). The cobrapy-json-cobrapy roundtrip would be (-INF, -INF) -> (null, null) -> (-INF, INF). We should give a warning in such cases, i.e. exporting (-INF, -INF) or (INF, INF)

Midnighter commented 5 years ago

Another option is to transform it to a string since Python is perfectly capable of parsing float("inf"), float("-inf"), or even float("nan").

matthiaskoenig commented 5 years ago

My hope was that there is something like a JSON schema for the cobrapy/bigg/escher JSON format which would allow validation against the schema (or at least I read some rumors in some other issues here which discussed version 1 and version of 2 of the JSON format). Most likely the bounds would be restricted to float/double in the schema, so that schema validation would fail against the "inf" or "-inf" strings.

Midnighter commented 5 years ago

The rumors are true :laughing: https://github.com/opencobra/cobrapy/blob/374ebfed7345eef6b32d8b7e6311676cb6232b17/cobra/io/json.py#L141

The current JSON schema desperately needs an update anyway in order to accommodate groups and compartments. So it wouldn't be a problem to change that part of the bounds.

cdiener commented 5 years ago

There could still exist some strange corner cases were we loose information, e.g. somebody decided to have bounds of (-INF, -INF) or (INF, INF) (not sure why, but is a valid LP).

I don't think any of the solvers support that. The internal handling is that variables with lower bounds of -inf or upper bounds of inf simply don't get constraints. An upper bound of -inf however is not possible with either GLPK, Gurobi or Cplex. So I would simply disallow that case in general.

Hemant27031999 commented 4 years ago

Hi all @Midnighter @draeger @matthiaskoenig @cdiener. I just went through the above issue and found that it is still existing. I tried to convert the XML fbc_ex1 model from the cobrapy data directory to JSON and got the same error which says:

ValueError: Out of range float values are not JSON compliant: -inf

Here is what I have figured out. While writing a model in JSON format, the value passed to json.dump() method (line 110 and 112 via dump_opts) for the argument allow_nan is false, which is the reason why NaN, INF and -INF values throw a value error (here). This allow_nan parameter should have been set to true to allow support for these values. Can I work on this issue? I can also write the test case to verify its working.

Midnighter commented 4 years ago

The reason I set allow_nan=False at the time is that it is not part of the official JSON spec. Since the JSON format is used by more tools than just cobrapy (and I guess non-Python tools, too), I'm not sure I feel comfortable allowing that now.

How is this actually handled within the SBML definition?

A possible fix is also to use the default=str argument such that if there are any type errors a conversion to string would be attempted. Not sure about the side effects of that right now, though.

cdiener commented 4 years ago

I agree that this should only be done if the schema version changes as well.

Hemant27031999 commented 4 years ago

The JSON schema surely needs to updated, and it is one of the tasks of my project. One thing which I want to point, and which has already been discussed in some other discussions on COBRApy as well, is about extracting the JSON schema out of the cobra.io.json file and be put at a place where others can easily find it out and provide their suggestion, somewhere like here.

Regarding the above issue, I can see the problem more clearly now. It is not that JSON does not support these values, but the thing is we don't have the right way to express this into the schema itself because the JSON type "number" doesn't support Inf, -Inf and NaN values. So if we have our schema specifying this type to be a number, it is not right to allow such values, isn't it?

In SBML, the type of the value attribute of Parameter is double and the double value is allowed to have INF, -INF and NaN values (section 3.1.5, page 11). What I think in this regard is, since INF, -INF, and NaN value is possible for the Parameter, as it is accepted officially, shouldn't it be the responsibility of the tool itself to parse them out correctly (from JSON or SBML)?

Hemant27031999 commented 4 years ago

Also, regarding the "number" type in JSON, I just tested out a few JSON instances with infinity values against the JSON schema specifying the type to be of "number" type and the validation did not produce any error. Here is one example: Schema:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
        "lower_bound": {"type": "number"}
        }
}

JSON instance:

{
    "lower_bound": Infinity
}

I tested it using the python's validate method (and an online validator also) and everything is working fine. Now though it is a weird behaviour, because "number" type explicitly says that these values are not permitted, but isn't it what we actually want? There is no explicitly defined type in JSON schema that can support infinity values, but "number" type is doing work in this case.

draeger commented 4 years ago

What I don't understand in this discussion is why the current JSON format prevents people from storing information just because solvers can't work with it. These are two different things. One is storing information, the other is solving a model. The solver will indicate to the user that it can't take a model with inappropriate bounds. But this does not necessarily render the data format invalid.

cdiener commented 4 years ago

I think this has been fixed by now; optlang now supports free constraints. For me it's more the worry of breaking things for BIGG since they depend on the JSON schema heavily.

Hemant27031999 commented 4 years ago

I wrote on JSON-schema slack channel to discuss why the "number" type is validating against the infinity and NaN values, even though the RFC document says that these values are not supported in "number" type to which they replied that some parsers allow these values, but it is not the standard JSON. Standard JSON doesn't have any explicit type which can cover these values so we have to ourselves handle it. The best possible way, which I can think we can have is, as proposed by @Midnighter. We can use string to represent infinity, -infinity and NaN values, and number type to represent other simple values

{
    ...
    "lower_bound": {"type": ["number", "string"]},
    "upper_bound": {"type": ["number", "string"]},
    ...
}

And while parsing the model, we will check if the value is of type string or number. If it is of type string, we can handle it using float(string) (because python supports float("inf"), float("-inf"), and float("nan")) inside COBRApy. A more advanced and verbose schema definition of this type can be:

{
    ...
    "lower_bound": {
           "anyOf": [
                        {
                          "type": "number"
                        },
                    {
                          "type": "string",
                          "enum": ["inf", "-inf", "nan"]
                        }
                ]
        },
    "upper_bound": {
           "anyOf": [
                        {
                          "type": "number"
                        },
                    {
                          "type": "string",
                          "enum": ["inf", "-inf", "nan"]
                        }
                ]
        },
    ...
}

This will restrict putting only three types of values inside a string and the rest will be numbers. Also, at the time of parsing, we can directly use the string value without performing any check on it.

draeger commented 4 years ago

The second alternative is safer and has a higher chance of correctly implemented by various tools that rely on the JSON format. If arbitrary Strings can be set in principle, the "validation" becomes more tricky because it has to be performed on the application's level. Could NaN also be added to the enumeration?

Hemant27031999 commented 4 years ago

Yeah, sure. We should add NaN also, the specification document allows NaN for the double type in SBML format. So we should have it in JSON also. I have updated it.

matthiaskoenig commented 4 years ago

I think the main thing of importance is to version the JSON schema. I.e. the current version is version 1 which BiGG can work with. But the version 1.0 is very limited and has many errors which makes it unusable for other use cases. E.g. the qualifiers of annotations are deleted resulting in completely incorrect annotations. The plan is to create a version 2.0 of the JSON schema which fixes all the shortcomings of the current version. By versioning the JSON parser and the JSON schema we will not break anything for BiGG and users using the version 1.0 of the schema, but we can move forward and encode important additional information.

I think this will solve most of the problems people have with many of the changes.