kibook / s1kd-tools

A set of small, free and open source software tools for manipulating S1000D data.
https://khzae.net/1/s1000d/s1kd-tools
GNU General Public License v3.0
38 stars 13 forks source link

s1kd-brexcheck ERROR: Invalid object path in BREX #5

Open MihailCosmin opened 4 years ago

MihailCosmin commented 4 years ago

Hi,

I have 2 huge BREX files, more than 8000 lines of code combined. And the s1kd-brexcheck fails with ERROR: Invalid object path in BREX Step by step I have commented the rules that were responsible for these errors. And found that there were only 4 rules that were causing such error.

Is there any option to run s1kd-brexcheck and ignore such errors? More exactly, just skip the BREX rules that are "broken" (although they are not really broken, maybe xmlXPathEvalExpression is not 100% correct).

The rules that do not work:

<structureObjectRule id="RULE-153"> -->
    <objectPath allowedObjectFlag="1">/dmodule/pdf:style</objectPath>
    <objectUse>Custom PDF styling rules. Not S1000D compliant.</objectUse>
</structureObjectRule>
<structureObjectRule id="RULE-321">
      <objectPath allowedObjectFlag="0">(//dmStatus/applic/displayText/simplePara[lower-case(.)[contains(.,'FFF')]])</objectPath>
</structureObjectRule>
<structureObjectRule id="RULE-555">
   <objectPath allowedObjectFlag="1">//dmCode[attribute::infoCode != "003" and matches(@assyCode, '.{2}')]</objectPath>
</structureObjectRule>
<structureObjectRule id="RULE-677">
    <objectPath allowedObjectFlag="1">/dmodule/identAndStatusSection/pmStatus/reasonForUpdate[not(@id= /dmodule/content//@reasonForUpdateRefIds/tokenize(normalize-space(), '\t'))]</objectPath>
</structureObjectRule>
kibook commented 4 years ago

Part of the issue is that libxml2 only supports XPath 1.0, so these functions from XPath 2.0 won't work:

I believe the issue with the first example is the use of a namespace. I think I need to register all namespaces declared in ancestor nodes to the xmlXPathContext used to resolve the XPath expression. Does your BREX DM have a declaration like this somewhere?

xmlns:pdf="http://someurl.com"

As long as it's on the <structureObjectRule> or any ancestor node (like the root <dmodule> node), that should be valid XPath 1.0.

There's no option currently to skip invalid rules, but I will definitely change it so that invalid rules are skipped (with a WARNING) instead of causing the whole check to be aborted. Thanks for the suggestion, @MihailCosmin !

kibook commented 4 years ago

Rules with "invalid" XPath will be skipped as of 37583f2

Now I just need to fix namespaces so that XPath using them aren't also considered invalid.

kibook commented 4 years ago

44ffdf5 should fix the use of XPath prefixes, as long as they have been properly declared in the BREX XML.

I mentioned declaring them on the <structureObjectRule> element before, but you can also declare them on the <objectPath> itself, or any ancestor node:

<structureObjectRule id="RULE-153">
    <objectPath xmlns:pdf="http://example.com" allowedObjectFlag="1">/dmodule/pdf:style</objectPath>
    <objectUse>Custom PDF styling rules. Not S1000D compliant.</objectUse>
</structureObjectRule>

Thanks again for pointing these out, the XPath namespace prefixes in particular is something I hadn't really considered.

kibook commented 4 years ago

Since S1000D (Issue 4.0+) does recommend XPath 2.0 in the specification, I'm looking in to extending libxml2's XPath support.

I'm not sure it can be made fully compliant to XPath 2.0 without modifying libxml2 itself, since XPath 2.0 isn't completely backwards compatible with XPath 1.0 (https://www.w3.org/TR/xpath20/#id-backwards-compatibility), but it's easy to add new functions, like matches. That may be "good enough".

I created a new project for "libxpath2": https://github.com/kibook/libxpath2. The idea is that once I get this somewhat stable, s1kd-brexcheck can use it to support XPath 2.0-like functions at the very least.

MihailCosmin commented 4 years ago

Hi, That would be great. Because now I see there were many more BREX rules that used Xpath 2.0. The reason they were not picked up before, was because I was checking only one DM. As soon as I started checking all DMs, more BREX rules were found to be wrong.

I see that Saxon offers support for Xpath 2.0

Saxon-HE (Home Edition) is an open source product available under the Mozilla Public License version 2.0. It provides implementations of XSLT (3.0), XQuery (3.1), and XPath (2.0 and 3.1) at the basic level of conformance defined by W3C, plus from Saxon 10 the optional features higher-order functions and dynamic evaluation. It is available for Java and .NET.

They have also some files for C: https://www.saxonica.com/download/c.xml

Not sure if you can use these, as I am a complete zero when talking about C.

kibook commented 4 years ago

Do you know if you use XPath 2.0-specific features besides functions, such as the if or for keywords? I'm not sure if those can be implemented as easily as adding extra functions.

Would you be able to provide a list of the XPath 2.0 functions used in your BREX? Eventually, I would like to implement all XPath 2.0 functions, but that's going to take some time. If you can give me a list of the ones you use, I could focus on those first.

I'll also look at Saxon to see if it would be possible to have an option to use its XPath engine to validate rules instead of libxml2 if the user chooses, in order to get complete XPath 2.0 support.

MihailCosmin commented 4 years ago

Do you know if you use XPath 2.0-specific features besides functions, such as the if or for keywords?

No cases where "if" or "for" are used. In fact all other rules that were wrong used only the three functions from below.

lower-case
matches
tokenize

But full support for Xpath 2.0 could still be needed.

kibook commented 4 years ago

Thanks for the list! However, after looking more into XPath 2.0, I don't think just adding XPath 2.0-like functions will be enough. For example, the way the tokenize function is used in your example is specific to XPath 2.0 syntax.

I'll probably focus more on getting the Saxon XPath engine working within s1kd-brexcheck.

kibook commented 4 years ago

I have added an experimental option to use Saxon to evaluate the objectPath instead of libxml2. In my tests, it worked successfully on all the example XPath 2.0-based rules you listed.

At least for now, it is a compile-time option, so that Saxon is not a requirement to build and run s1kd-brexcheck. To enable it:

  1. Download and install Saxon/C from http://www.saxonica.com/saxon-c/index.xml

  2. Build with:

    $ make BREXCHECK_XPATH_ENGINE=saxon

I've only tested it on Linux so far, so the Makefile might still need some adjustments to work out-of-the-box on Windows with MinGW/Cygwin.

I have included a slightly modified version of the Saxon/C API along with the s1kd-brexcheck source that addresses a bug in the latest released version (1.2.1): https://saxonica.plan.io/issues/4513. The XPathProcessor::declareNamespace function for declaring namespaces for use in XPath expressions (like pdf:style) was broken.

There are still some major issues left to work out, like what appear to be big memory leaks within the Saxon/C library itself, though I could also just be misunderstanding how to properly free everything. Either way, it doesn't seem suitable to use on large sets of DMs in one call right now, but a separate call for each DM works okay, though it is significantly slower.

MihailCosmin commented 4 years ago

Do I need to add an argument to the brexcheck command to do the check using saxon? I'm trying on Linux and I'm getting these errors:

xmlXPathCompOpEval: function **lower-case** not found
XPath error : Unregistered function
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...

xmlXPathCompOpEval: function **matches** not found
XPath error : Unregistered function
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...

XPath error : Invalid expression
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...
kibook commented 4 years ago

Did you build with:

$ make BREXCHECK_XPATH_ENGINE=saxon

From your errors, it is still using libxml2's XPath engine. You can check which engine it uses with:

$ s1kd-brexcheck --version

There is a new XPath engine: line.

I'm still working out what is needed for a binary package that would have Saxon enabled, so you will have to build from source to test.

MihailCosmin commented 4 years ago

Yup, that did it.

Now it seems to work. I also made an error in a DM according to a BREX rule that used matches, and it was detected.

Just for info. I tried to build on Windows and I got the below errors:

$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -O3 -c -Isaxon/Saxon.C.API -o saxon/saxon.o `pkg-config --cflags libxml-2.0` saxon/saxon.cpp
cc -O3 -c -o saxon/SaxonCGlue.o saxon/Saxon.C.API/SaxonCGlue.c
cc -O3 -c -o saxon/SaxonCXPath.o saxon/Saxon.C.API/SaxonCXPath.c
g++ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp
saxon/Saxon.C.API/SaxonProcessor.cpp: In member function ‘void SaxonProcessor::setResourcesDirectory(const char*)’:
saxon/Saxon.C.API/SaxonProcessor.cpp:516:2: error: ‘strncat_s’ was not declared in this scope; did you mean ‘strncat’?
  516 |  strncat_s(_getResourceDirectory(), destSize,dir, strlen(dir));
      |  ^~~~~~~~~
      |  strncat
make: *** [Makefile:55: saxon/SaxonProcessor.o] Error 1
kibook commented 4 years ago

Thanks for the info! I got the same error too when I gave it a quick try. The basic issue seems to be that Cygwin/MinGW are counted as Windows in the Saxon/C code's checks, but their environments are set up more like Linux. strncat_s is a Microsoft extension that is obviously not included in Cygwin or MinGW's standard C libraries.

Saxon/C also hardcodes the location of the shared library (libsaxonhec.dll) and Java runtime (rt) it uses, which will be an issue moving the binaries between environments (Cygwin/MinGW to normal Windows).

kibook commented 4 years ago

I updated the Makefile in a236fa1 to treat Cygwin and MinGW as Linux when compiling the Saxon/C API, except for SaxonCGlue.c, which contains the hardcoded path to the shared library. I was able to build it successfully in my MinGW environment after that.

MihailCosmin commented 4 years ago

I'm still getting the same error as before on cygwin.

kibook commented 4 years ago

In the Makefile output, do you see -D__linux__ in the gcc and g++ commands (except for the one compiling SaxonCGlue.c)? That is what should cause it to use the "linux" portions of the Saxon/C code, like plain strncat instead of strncat_s.

Although I was able to build s1kd-brexcheck, I haven't been able to run it yet despite tweaking the hardcoded shared library path in various ways. I am going to have to keep experimenting.

MihailCosmin commented 4 years ago

No,

This is the output, same as before:

$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -O3 -c -Isaxon/Saxon.C.API -o saxon/saxon.o `pkg-config --cflags libxml-2.0` saxon/saxon.cpp
cc -O3 -c -o saxon/SaxonCGlue.o saxon/Saxon.C.API/SaxonCGlue.c
cc -O3 -c -o saxon/SaxonCXPath.o saxon/Saxon.C.API/SaxonCXPath.c
g++ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp
saxon/Saxon.C.API/SaxonProcessor.cpp: In member function ‘void SaxonProcessor::setResourcesDirectory(const char*)’:
saxon/Saxon.C.API/SaxonProcessor.cpp:516:2: error: ‘strncat_s’ was not declared in this scope; did you mean ‘strncat’?
  516 |  strncat_s(_getResourceDirectory(), destSize,dir, strlen(dir));
      |  ^~~~~~~~~
      |  strncat
make: *** [Makefile:69: saxon/SaxonProcessor.o] Error 1
kibook commented 4 years ago

Try exporting the OSTYPE environment variable to make sure it is visible to make:

$ export OSTYPE
$ make BREXCHECK_XPATH_ENGINE=saxon

If that works, you can add export OSTYPE to ~/.profile so that it is always exported.

MihailCosmin commented 4 years ago

That possibly worked. The .exe was created. But I cannot use it in cmd.

On one system I just get an error message as a pop-up : "The application was unable to start correctly (0xc000007b)." And one another system I get a message from cmd:

Unable to load C:\Program Files\Saxonica\SaxonHEC1.2.1\libsaxonhec.dll
Error: : Invalid or incomplete multibyte or wide character

But libsaxonhec.dll is exactly at that location.

If needed, this was the output from make:

$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -D__linux__ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp

g++ -D__linux__ -O3 -c -o saxon/SchemaValidator.o saxon/Saxon.C.API/SchemaValidator.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmAtomicValue.o saxon/Saxon.C.API/XdmAtomicValue.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmItem.o saxon/Saxon.C.API/XdmItem.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmNode.o saxon/Saxon.C.API/XdmNode.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmValue.o saxon/Saxon.C.API/XdmValue.cpp
g++ -D__linux__ -O3 -c -o saxon/XPathProcessor.o saxon/Saxon.C.API/XPathProcessor.cpp
g++ -D__linux__ -O3 -c -o saxon/XQueryProcessor.o saxon/Saxon.C.API/XQueryProcessor.cpp
g++ -D__linux__ -O3 -c -o saxon/Xslt30Processor.o saxon/Saxon.C.API/Xslt30Processor.cpp
g++ -D__linux__ -O3 -c -o saxon/XsltProcessor.o saxon/Saxon.C.API/XsltProcessor.cpp
>brex.h; for f in brex/DMC-AE-A-04-10-0301-00A-022A-D_003-00.XML brex/DMC-S1000D-A-04-10-0301-00A-022A-D_004-00_EN-US.XML brex/DMC-S1000D-E-04-10-0301-00A-022A-D_009-00_EN-US.XML brex/DMC-S1000D-F-04-10-0301-00A-022A-D_001-00_EN-US.XML brex/DMC-S1000D-G-04-10-0301-00A-022A-D_001-00_EN-US.XML stats.xsl; do xxd -i "$f" >>brex.h; done
cc -Wall -Werror -pedantic-errors -I ../common `pkg-config --cflags libxml-2.0 libxslt libexslt` -O3 -DUSE_SAXON -o s1kd-brexcheck s1kd-brexcheck.c ../common/s1kd_tools.c saxon/saxon.o saxon/SaxonCGlue.o saxon/SaxonCXPath.o saxon/SaxonProcessor.o saxon/SchemaValidator.o saxon/XdmAtomicValue.o saxon/XdmItem.o saxon/XdmNode.o saxon/XdmValue.o saxon/XPathProcessor.o saxon/XQueryProcessor.o saxon/Xslt30Processor.o saxon/XsltProcessor.o `pkg-config --libs libxml-2.0 libxslt libexslt` -lstdc++ -ldl
kibook commented 4 years ago

I get the same errors. I'm going to keep looking in to them.

kibook commented 4 years ago

I tried a different approach by modifying the Saxon/C API to link with the libsaxonhec library at compile-time, instead of dynamically loading it at run-time from a hard-coded path (3b80c66). I am now able to successfully run it on Windows.

The pre-release 3.2.5 contains an extra s1kd-brexcheck_4.4.4_win64-saxon.zip package, which is a standalone distribution of s1kd-brexcheck with Saxon enabled.

kibook commented 4 years ago

I forgot to include the JET runtime files in the above package. Use s1kd-brexcheck_4.4.5_win64-saxon.zip instead.

kibook commented 4 years ago

Some updates:

I'm still investigating the memory leak issues with Saxon/C. I reported what I found on their development tracker, and they are looking into it as well: https://saxonica.plan.io/boards/4/topics/7982.

I found another open source XPath 2.0 implementation, XQilla. It seems to be faster than Saxon and with a smaller footprint, plus no memory leaks. Unfortunately, I haven't had much luck finding or building Windows binaries for it, so I've only gotten it working on Linux thus far.

BREXCHECK_XPATH_ENGINE can now have the following values:

kibook commented 4 years ago

I finally did manage to build XQilla on Windows with Cygwin, and I created a standalone Windows package for testing: s1kd-brexcheck_4.8.0_win64-xqilla.zip.

MihailCosmin commented 4 years ago

I finally did manage to build XQilla on Windows with Cygwin, and I created a standalone Windows package for testing: s1kd-brexcheck_4.8.0_win64-xqilla.zip.

Hi, the new s1kd-brexcheck.exe doesn't give any error messages, so it appears to work... But it also doesn't detect any mistakes in the datamodules.

For example I made some clear mistakes regarding the securityClassification, systemDiffCode and modelIdentCode. And none are detected with the new version of s1kd-brexcheck.exe. With the old version they are detected.

kibook commented 4 years ago

Good catch! This was an issue with how Xerces-C implements attribute nodes. DOMNode::getParentNode() doesn't return the element they are on, DOMAttr::getOwnerElement() must be used instead. This should be fixed in 022c40a, s1kd-brexcheck_4.8.2_win64-xqilla.zip.

MihailCosmin commented 4 years ago

Yes, I confirm now it works and it detects the errors.

kibook commented 4 years ago

Great! Thank you for testing it.

530216e is an overhaul on the XPath 2.0 functionality, so that it's not an either-or choice, but is based on what version of XPath was referenced by the corresponding issue of S1000D for each BREX. If the BREX is issue 3.0 and lower, it will use XPath 1.0 (with libxml), and if it is issue 4.0 and up, it will use XPath 2.0 (with either Saxon or XQilla). You can override this with --xpath-version to force it to use either version regardless of the issue of the BREX.

I changed the name of the Makefile variable to XPATH2_ENGINE, with options NONE, SAXON or XQILLA to reflect that you aren't replacing libxml anymore. The default will still be NONE while I keep testing/improving the interfaces for the other options.

Here are the latest Windows builds: