sybila / biodivine-lib-param-bn

Rust library for working with parametrised Boolean networks.
MIT License
2 stars 2 forks source link

Further bugfixes to SBML parsing. #9

Closed daemontus closed 3 years ago

daemontus commented 3 years ago

This PR extends #7 and resolves several new problems discovered when using the SBML parser.

  1. In some models, there are several entities that share the same name, but have different id. After #7 was merged, these models could not be imported due to the name clash. Here, we fix this problem by detecting such duplicate names and prefixing such name with its id, which is always unique. Names that do not clash are unchanged.
  2. Models that work with DNF/CNF representations of boolean functions tend to also include and/or tags with only one argument. This previously caused an error, but now these useless tags are simply ignored.
  3. When reading lists of XML tags, any discovered syntax error would be silently dropped and replaced with an error about unexpected end of document. We try to mitigate this and return the syntax error instead where possible.
  4. Sadly, xml-rs contains a bug that considers strings with value /> to be invalid. Such strings appear in SBML models when HTML formatting is included in annotations/notes. Thankfully, a pull request for this issue exists, but is not released yet. So for the time being, we switch to the fork where the bug is already fixed. Once the issue is resolved, we can move back to depending on xml-rs.

Overall, this pull request closes all major issues with SBML parsing that I am aware of (So far, I haven't found any problematic SBML model in ginsim/cellcollective databases). It is therefore related to #1, however, I still don't believe the issue should be closed because: a) I am expecting further problems as more models are tested, the issue will track those. b) The SBML parser is still a poorly designed and poorly tested heap of garbage. We really need a better way to handle it. However, it will probably also require minor changes to the .aeon text format.

codecov[bot] commented 3 years ago

Codecov Report

Merging #9 (e47f246) into master (f11a06f) will decrease coverage by 0.29%. The diff coverage is 73.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #9      +/-   ##
==========================================
- Coverage   84.93%   84.63%   -0.30%     
==========================================
  Files          29       29              
  Lines        1859     1901      +42     
==========================================
+ Hits         1579     1609      +30     
- Misses        280      292      +12     
Impacted Files Coverage Δ
src/sbml/mod.rs 73.09% <73.91%> (-0.16%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f11a06f...e47f246. Read the comment docs.

daemontus commented 3 years ago

Good point, since this will close #1, I have created #10 as a reminder to refactor this. However, we should also seriously think about extending aeon model format with stuff like input/output/constant, better handling of explicit parameters, variable annotations, etc. because there are things we can't really import at the moment...