mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.
BSD 3-Clause "New" or "Revised" License
2.05k stars 121 forks source link

Throw telling exceptions when XML documents cannot be parsed #665

Closed triska closed 4 years ago

triska commented 4 years ago

I would like to use library(sgml) to parse XBRL documents and other XML files.

When I save the example file shown below to xbrl.xml, I get:

?- load_xml(file("xbrl.xml"), DOM, []).
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/machine/system_calls.rs:5923:39
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Instead of a crash, it would be highly preferable to throw an exception that nicely indicates the problem that arose when parsing the file. roxmltree should be asked for a description of the error, and this description should be used in the thrown exception.

This may be an interesting issue for new contributors who are interested in Rust and XML.

Sample file, taken from https://en.wikipedia.org/wiki/XBRL:

<xbrli:xbrl
xmlns:ifrs-gp="http://xbrl.iasb.org/int/fr/ifrs/gp/2005-05-15"
xmlns:iso4217="http://www.xbrl.org/2003/iso4217"
xmlns:xbrli="http://www.xbrl.org/2003/instance"
xmlns:xbrll="http://www.xbrl.org/2003/linkbase"
xmlns:xlink="http://www.w3.org/1999/xlink">

    <xbrll:schemaRef xbrll:href="http://www.org.com/xbrl/taxonomy" xlink:type="simple"/>
    <ifrs-gp:OtherOperatingIncomeTotalFinancialInstitutions contextRef="J2004" 
        decimals="0" unitRef="EUR">38679000000</ifrs-gp:OtherOperatingIncomeTotalFinancialInstitutions>
    <ifrs-gp:OtherAdministrativeExpenses contextRef="J2004" 
        decimals="0" unitRef="EUR">35996000000</ifrs-gp:OtherAdministrativeExpenses>
    <ifrs-gp:OtherOperatingExpenses contextRef="J2004" 
        decimals="0" unitRef="EUR">870000000</ifrs-gp:OtherOperatingExpenses>
    ...
    <ifrs-gp:OtherOperatingIncomeTotalByNature contextRef="J2004" 
        decimals="0" unitRef="EUR">10430000000</ifrs-gp:OtherOperatingIncomeTotalByNature>
    <xbrli:context id="BJ2004">
        <xbrli:entity>
            <xbrli:identifier scheme="www.iqinfo.com/xbrl">ACME</xbrli:identifier>
        </xbrli:entity>
        <xbrli:period>
            <xbrli:instant>2004-01-01</xbrli:instant>
        </xbrli:period>
    </xbrli:context>
    <xbrli:context id="EJ2004">
        <xbrli:entity>
            <xbrli:identifier scheme="www.iqinfo.com/xbrl">ACME</xbrli:identifier>
        </xbrli:entity>
        <xbrli:period>
            <xbrli:instant>2004-12-31</xbrli:instant>
        </xbrli:period>
    </xbrli:context>
    <xbrli:context id="J2004">
        <xbrli:entity>
            <xbrli:identifier scheme="www.iqinfo.com/xbrl">ACME</xbrli:identifier>
        </xbrli:entity>
        <xbrli:period>
            <xbrli:startDate>2004-01-01</xbrli:startDate>
            <xbrli:endDate>2004-12-31</xbrli:endDate>
        </xbrli:period>
    </xbrli:context>
    <xbrli:unit id="EUR">
        <xbrli:measure>iso4217:EUR</xbrli:measure>
    </xbrli:unit>
</xbrli:xbrl>
triska commented 4 years ago

This is resolved with #668, so I'm closing this issue.

@CharlesHoffmanCPA: With #668 applied, we can now use Scryer Prolog to parse and conveniently reason about XBRL files. For example, with said xbrl.xml sample file from Wikipedia, we can load it, and query what we are interested in, for example via:

?- use_module(library(sgml)).
   true.
?- use_module(library(xpath)).
   true.
?- load_xml(file("xbrl.xml"), DOM, []),
   xpath(DOM, //'OtherOperatingExpenses'(number), Expenses).

Yielding:

   DOM = [...], Expenses = 870000000

Note how the predicate xpath/3 from library(xpath) lets us conveniently query specific elements by specifying XPath selectors. XML files are naturally represented as Prolog terms, and so Prolog is an excellent fit for reasoning about XBRL files!