Tellurium does not output valid omex file

bilalshaikh42 commented 3 years ago

I am using the demo code for the network modelling summer school here: https://colab.research.google.com/drive/1ZZISM2CjfoeXPRST5-d1CDUO5bwK8My7

I assume most of these errors are from Biomodels. Is there some way to ensure that tellurium outputs correct omex archives? Currently, these cannot be run on BioSimulations.

The combine archive that is output is not valid. Here are the errors:

The COMBINE/OMEX archive is invalid.
  - The SED-ML file at location `main.xml` is invalid.
    - Model `wolf` is invalid.
      - The model file `wolf_sbml.xml` is invalid.
        - `/tmp/tmphg53irqv/wolf_sbml.xml` is not a file.
    - Data generator `plot_0_0_0` is invalid.
      - Variable `time` is invalid.
        - Variable should not reference a model.
    - Data generator `plot_0_0_1` is invalid.
      - Variable `oxy` is invalid.
        - Variable should not reference a model.
        - One or more namespaces required for target `/sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='oxy']` are not defined. No namespaces are defined for the target.

The validation is being done from this tool: https://run.biosimulations.dev/utils/validate

bilalshaikh42 commented 3 years ago

Here is the attempt to run the model: https://run.biosimulations.dev/simulations/60fadc0f75a71768c1b0ab55#tab=log

vporubsky commented 3 years ago

I will need to look into this particular example - it seems some of the models imported from BioModels may not be readily compatible with Tellurium's omex archive export.

However, I have an example from the summer school of a simple uni-uni reaction written in Antimony, which can be exported as a functional omex archive.

These two short Python Scripts can be used to generate the omex archive: https://drive.google.com/drive/folders/1gfFrxO-jXEAENlOWf6Dr7ShYJRmkgW5f?usp=sharing

luciansmith commented 3 years ago

The /tmp/tmphg53irqv/wolf_sbml.xml problem is indeed a bug in tellurium--thanks for pointing it out, and I'll look into fixing it.

However, variables can indeed reference both models and tasks. In fact, in the case where a repeated task references two models, you must reference both a model and a task so you know which model/task combination you're talking about.

Even in the case of single-model tasks, however, I don't think it is wrong to specify both, as long as the task in question does indeed involve the model.

To be fair, this is only something we recently worked out for SED-ML L1v4 (though it applies to earlier specs), so it's totally reasonable for you to have put that in as a validation rule based on earlier specs.

The namespaces issue also does need to be fixed in Tellurium, though BioSimulators was the first program to notice the issue, and other tools tend to ignore it. Could it be a warning instead of an error?

bilalshaikh42 commented 3 years ago

@jonrkarr Could we make these changes to the validation? From previous discussions I believe we decided to be strict about the namespace.

I am not fully sure what is happening in terms of the reference to the model. Is this something that we need to change?

However, variables can indeed reference both models and tasks. In fact, in the case where a repeated task references two models, you must reference both a model and a task so you know which model/task combination you're talking about.

Even in the case of single-model tasks, however, I don't think it is wrong to specify both, as long as the task in question does indeed involve the model.

To be fair, this is only something we recently worked out for SED-ML L1v4 (though it applies to earlier specs), so it's totally reasonable for you to have put that in as a validation rule based on earlier specs.

The namespaces issue also does need to be fixed in Tellurium, though BioSimulators was the first program to notice the issue, and other tools tend to ignore it. Could it be a warning instead of an error?

bilalshaikh42 commented 3 years ago

I will need to look into this particular example - it seems some of the models imported from BioModels may not be readily compatible with Tellurium's omex archive export.

However, I have an example from the summer school of a simple uni-uni reaction written in Antimony, which can be exported as a functional omex archive.

These two short Python Scripts can be used to generate the omex archive: https://drive.google.com/drive/folders/1gfFrxO-jXEAENlOWf6Dr7ShYJRmkgW5f?usp=sharing

Thank you! I'll use these as a demonstration, and then show how BioSimulations can be used to get around errors in combine archives due to outdated/incorrect databases for example.

jonrkarr commented 3 years ago

I'm not inclined to treat the namespace issue as a warning. In my opinion, these namespaces are quite important.

As @luciansmith mentions, the model issue is a very recent evolution for SED-ML L1V4. However, that change is really intended for a different purpose. Other tools have not behaved the way that tellurium does. For example, the SED-ML in BioModels from COPASI doesn't do this. Neither do the examples at http://sed-ml.io. My preference is for tellurium to conform to that convention.

I expect that some SED-ML files produced by tellurium will not work with other tools and vice-versa. See the several issues I've posted to this repository over the past year. While our test suite considers tellurium "valid", this is because we've engineered the test suite to interpret tellurium's idiosyncrasies as warnings rather than errors. Effectively, we've temporarily grandfathered in exceptions specifically motivated by tellurium.

As tellurium works through the GitHub issues, we can help steer it to be consistent with the tools derived from BioSimulators-utils and jLibSED-ML.

luciansmith commented 3 years ago

So hey, unless I'm wrong, the name error was in the workshop instructions, not in tellurium/phrasedml itself. Am I right? I assumed that since the message mentioned tmp/whatever/ that tellurium was adding that to the model location, but fortunately, it seems like this is not the case. (I do think the message is confusing, though--I'll see if I can file an issue tracker request to change it.)

I'm happy to change the namespace issue. It's Yet Another Reason I don't like the xpath solution...

For the model/task IDs, though, I still don't see any real problem. It's not invalid against the schema, and it's not invalid against the spec. Other tools may well choose to not record the information in the optional attribute, but I see no problem with defining it, just as with any other optional attribute. Nobody else has problems reading in the file, either (as far as I know) because nobody even checks the model attribute if they have the task and the task has one model. On the positive side, it makes the file slightly more readable, and it protects us against the task-with-two-models issue both now and in the future.

bilalshaikh42 commented 3 years ago

No, the name error is present in tellurium. The "/tmp/somePath" prefix is an artifact of the workspace, but the Sedml refers to the model as "wolf_sbml.xml" when the file is actually named "wolf.xml".

On Fri, Jul 23, 2021 at 9:13 PM Lucian Smith @.***> wrote:

So hey, unless I'm wrong, the name error was in the workshop instructions, not in tellurium/phrasedml itself. Am I right? I assumed that since the message mentioned tmp/whatever/ that tellurium was adding that to the model location, but fortunately, it seems like this is not the case. (I do think the message is confusing, though--I'll see if I can file an issue tracker request to change it.)

I'm happy to change the namespace issue. It's Yet Another Reason I don't like the xpath solution...

For the model/task IDs, though, I still don't see any real problem. It's not invalid against the schema, and it's not invalid against the spec. Other tools may well choose to not record the information in the optional attribute, but I see no problem with defining it, just as with any other optional attribute. Nobody else has problems reading in the file, either (as far as I know) because nobody even checks the model attribute if they have the task and the task has one model. On the positive side, it makes the file slightly more readable, and it protects us against the task-with-two-models issue both now and in the future.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sys-bio/tellurium/issues/533#issuecomment-885980583, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHX4FICDGF2H355YCBON7XLTZIHVBANCNFSM5A4GQAXA .

luciansmith commented 3 years ago

That's because you told it the name was wolf_sbml.xml in the phrasedml string. I think.

luciansmith commented 3 years ago

I've updated https://colab.research.google.com/drive/1ZZISM2CjfoeXPRST5-d1CDUO5bwK8My7 so that it should create a valid link to a valid SBML file. I think the problem was that when exporting to an SBML file from Antimony, it automatically writes the filename as '[name].xml', and we gave it the model name 'wolf'.

luciansmith commented 3 years ago

Getting back to this...

There were three errors you originally found:

The model file wolf_sbml.xml is invalid.
Variable time is invalid.
- Variable should not reference a model.
Variable oxy is invalid.
- One or more namespaces required for target /sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='oxy'] are not defined. No namespaces are defined for the target.

The model name problem was a bug in the phrasedml exercise, not in Tellurium itself. (Basically, we told it that the filename was wolf_sbml.xml instead of wolf.xml.)

The 'Variable should not reference a model' problem is a problem in the validator: as we finally worked out in l1v4, there are cases with repeated tasks where a variable must reference both a task and a model. For other cases, it is allowed to define an implied task and/or model to be explicit (though if defined, it must be correct), This was clarified in L1v4, but applies to previous levels/versions of SED-ML as well, since this situation could arise in any document with a repeated task.

I'm working on a fix for the namespaces thing in phraSED-ML, or at least a usually-correct version of it.

Does all this seem correct?

jonrkarr commented 3 years ago

The requirement to not have a model stemmed from my interpretation of L1V3 and how we observed other tools to use SED-ML, including the SED-ML.org examples and BioModels. We have an open issue to implement the new features introduced in L1V4: biosimulators/Biosimulators_utils#22.

luciansmith commented 1 year ago

I believe all the issues brought up here have now been fixed. If this is not the case, let us know!

sys-bio / tellurium

Tellurium does not output valid omex file #533