Open MihailCosmin opened 4 years ago
Part of the issue is that libxml2 only supports XPath 1.0, so these functions from XPath 2.0 won't work:
lower-case
matches
tokenize
I believe the issue with the first example is the use of a namespace. I think I need to register all namespaces declared in ancestor nodes to the xmlXPathContext used to resolve the XPath expression. Does your BREX DM have a declaration like this somewhere?
xmlns:pdf="http://someurl.com"
As long as it's on the <structureObjectRule>
or any ancestor node (like the root <dmodule>
node), that should be valid XPath 1.0.
There's no option currently to skip invalid rules, but I will definitely change it so that invalid rules are skipped (with a WARNING) instead of causing the whole check to be aborted. Thanks for the suggestion, @MihailCosmin !
Rules with "invalid" XPath will be skipped as of 37583f2
Now I just need to fix namespaces so that XPath using them aren't also considered invalid.
44ffdf5 should fix the use of XPath prefixes, as long as they have been properly declared in the BREX XML.
I mentioned declaring them on the <structureObjectRule>
element before, but you can also declare them on the <objectPath>
itself, or any ancestor node:
<structureObjectRule id="RULE-153">
<objectPath xmlns:pdf="http://example.com" allowedObjectFlag="1">/dmodule/pdf:style</objectPath>
<objectUse>Custom PDF styling rules. Not S1000D compliant.</objectUse>
</structureObjectRule>
Thanks again for pointing these out, the XPath namespace prefixes in particular is something I hadn't really considered.
Since S1000D (Issue 4.0+) does recommend XPath 2.0 in the specification, I'm looking in to extending libxml2's XPath support.
I'm not sure it can be made fully compliant to XPath 2.0 without modifying libxml2 itself, since XPath 2.0 isn't completely backwards compatible with XPath 1.0 (https://www.w3.org/TR/xpath20/#id-backwards-compatibility), but it's easy to add new functions, like matches
. That may be "good enough".
I created a new project for "libxpath2": https://github.com/kibook/libxpath2. The idea is that once I get this somewhat stable, s1kd-brexcheck can use it to support XPath 2.0-like functions at the very least.
Hi, That would be great. Because now I see there were many more BREX rules that used Xpath 2.0. The reason they were not picked up before, was because I was checking only one DM. As soon as I started checking all DMs, more BREX rules were found to be wrong.
I see that Saxon offers support for Xpath 2.0
Saxon-HE (Home Edition) is an open source product available under the Mozilla Public License version 2.0. It provides implementations of XSLT (3.0), XQuery (3.1), and XPath (2.0 and 3.1) at the basic level of conformance defined by W3C, plus from Saxon 10 the optional features higher-order functions and dynamic evaluation. It is available for Java and .NET.
They have also some files for C: https://www.saxonica.com/download/c.xml
Not sure if you can use these, as I am a complete zero when talking about C.
Do you know if you use XPath 2.0-specific features besides functions, such as the if
or for
keywords? I'm not sure if those can be implemented as easily as adding extra functions.
Would you be able to provide a list of the XPath 2.0 functions used in your BREX? Eventually, I would like to implement all XPath 2.0 functions, but that's going to take some time. If you can give me a list of the ones you use, I could focus on those first.
I'll also look at Saxon to see if it would be possible to have an option to use its XPath engine to validate rules instead of libxml2 if the user chooses, in order to get complete XPath 2.0 support.
Do you know if you use XPath 2.0-specific features besides functions, such as the if or for keywords?
No cases where "if" or "for" are used. In fact all other rules that were wrong used only the three functions from below.
lower-case
matches
tokenize
But full support for Xpath 2.0 could still be needed.
Thanks for the list! However, after looking more into XPath 2.0, I don't think just adding XPath 2.0-like functions will be enough. For example, the way the tokenize
function is used in your example is specific to XPath 2.0 syntax.
I'll probably focus more on getting the Saxon XPath engine working within s1kd-brexcheck.
I have added an experimental option to use Saxon to evaluate the objectPath instead of libxml2. In my tests, it worked successfully on all the example XPath 2.0-based rules you listed.
At least for now, it is a compile-time option, so that Saxon is not a requirement to build and run s1kd-brexcheck. To enable it:
Download and install Saxon/C from http://www.saxonica.com/saxon-c/index.xml
Build with:
$ make BREXCHECK_XPATH_ENGINE=saxon
I've only tested it on Linux so far, so the Makefile might still need some adjustments to work out-of-the-box on Windows with MinGW/Cygwin.
I have included a slightly modified version of the Saxon/C API along with the s1kd-brexcheck source that addresses a bug in the latest released version (1.2.1): https://saxonica.plan.io/issues/4513. The XPathProcessor::declareNamespace
function for declaring namespaces for use in XPath expressions (like pdf:style
) was broken.
There are still some major issues left to work out, like what appear to be big memory leaks within the Saxon/C library itself, though I could also just be misunderstanding how to properly free everything. Either way, it doesn't seem suitable to use on large sets of DMs in one call right now, but a separate call for each DM works okay, though it is significantly slower.
Do I need to add an argument to the brexcheck command to do the check using saxon? I'm trying on Linux and I'm getting these errors:
xmlXPathCompOpEval: function **lower-case** not found
XPath error : Unregistered function
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...
xmlXPathCompOpEval: function **matches** not found
XPath error : Unregistered function
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...
XPath error : Invalid expression
s1kd-brexcheck: WARNING: Ignoring invalid object path in ...
Did you build with:
$ make BREXCHECK_XPATH_ENGINE=saxon
From your errors, it is still using libxml2's XPath engine. You can check which engine it uses with:
$ s1kd-brexcheck --version
There is a new XPath engine:
line.
I'm still working out what is needed for a binary package that would have Saxon enabled, so you will have to build from source to test.
Yup, that did it.
Now it seems to work. I also made an error in a DM according to a BREX rule that used matches
, and it was detected.
Just for info. I tried to build on Windows and I got the below errors:
$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -O3 -c -Isaxon/Saxon.C.API -o saxon/saxon.o `pkg-config --cflags libxml-2.0` saxon/saxon.cpp
cc -O3 -c -o saxon/SaxonCGlue.o saxon/Saxon.C.API/SaxonCGlue.c
cc -O3 -c -o saxon/SaxonCXPath.o saxon/Saxon.C.API/SaxonCXPath.c
g++ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp
saxon/Saxon.C.API/SaxonProcessor.cpp: In member function ‘void SaxonProcessor::setResourcesDirectory(const char*)’:
saxon/Saxon.C.API/SaxonProcessor.cpp:516:2: error: ‘strncat_s’ was not declared in this scope; did you mean ‘strncat’?
516 | strncat_s(_getResourceDirectory(), destSize,dir, strlen(dir));
| ^~~~~~~~~
| strncat
make: *** [Makefile:55: saxon/SaxonProcessor.o] Error 1
Thanks for the info! I got the same error too when I gave it a quick try. The basic issue seems to be that Cygwin/MinGW are counted as Windows in the Saxon/C code's checks, but their environments are set up more like Linux. strncat_s is a Microsoft extension that is obviously not included in Cygwin or MinGW's standard C libraries.
Saxon/C also hardcodes the location of the shared library (libsaxonhec.dll) and Java runtime (rt) it uses, which will be an issue moving the binaries between environments (Cygwin/MinGW to normal Windows).
I updated the Makefile in a236fa1 to treat Cygwin and MinGW as Linux when compiling the Saxon/C API, except for SaxonCGlue.c, which contains the hardcoded path to the shared library. I was able to build it successfully in my MinGW environment after that.
I'm still getting the same error as before on cygwin.
In the Makefile output, do you see -D__linux__
in the gcc
and g++
commands (except for the one compiling SaxonCGlue.c
)? That is what should cause it to use the "linux" portions of the Saxon/C code, like plain strncat instead of strncat_s.
Although I was able to build s1kd-brexcheck, I haven't been able to run it yet despite tweaking the hardcoded shared library path in various ways. I am going to have to keep experimenting.
No,
This is the output, same as before:
$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -O3 -c -Isaxon/Saxon.C.API -o saxon/saxon.o `pkg-config --cflags libxml-2.0` saxon/saxon.cpp
cc -O3 -c -o saxon/SaxonCGlue.o saxon/Saxon.C.API/SaxonCGlue.c
cc -O3 -c -o saxon/SaxonCXPath.o saxon/Saxon.C.API/SaxonCXPath.c
g++ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp
saxon/Saxon.C.API/SaxonProcessor.cpp: In member function ‘void SaxonProcessor::setResourcesDirectory(const char*)’:
saxon/Saxon.C.API/SaxonProcessor.cpp:516:2: error: ‘strncat_s’ was not declared in this scope; did you mean ‘strncat’?
516 | strncat_s(_getResourceDirectory(), destSize,dir, strlen(dir));
| ^~~~~~~~~
| strncat
make: *** [Makefile:69: saxon/SaxonProcessor.o] Error 1
Try exporting the OSTYPE environment variable to make sure it is visible to make
:
$ export OSTYPE
$ make BREXCHECK_XPATH_ENGINE=saxon
If that works, you can add export OSTYPE
to ~/.profile
so that it is always exported.
That possibly worked. The .exe was created. But I cannot use it in cmd.
On one system I just get an error message as a pop-up : "The application was unable to start correctly (0xc000007b)." And one another system I get a message from cmd:
Unable to load C:\Program Files\Saxonica\SaxonHEC1.2.1\libsaxonhec.dll
Error: : Invalid or incomplete multibyte or wide character
But libsaxonhec.dll is exactly at that location.
If needed, this was the output from make:
$ make BREXCHECK_XPATH_ENGINE=saxon
g++ -D__linux__ -O3 -c -o saxon/SaxonProcessor.o saxon/Saxon.C.API/SaxonProcessor.cpp
g++ -D__linux__ -O3 -c -o saxon/SchemaValidator.o saxon/Saxon.C.API/SchemaValidator.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmAtomicValue.o saxon/Saxon.C.API/XdmAtomicValue.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmItem.o saxon/Saxon.C.API/XdmItem.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmNode.o saxon/Saxon.C.API/XdmNode.cpp
g++ -D__linux__ -O3 -c -o saxon/XdmValue.o saxon/Saxon.C.API/XdmValue.cpp
g++ -D__linux__ -O3 -c -o saxon/XPathProcessor.o saxon/Saxon.C.API/XPathProcessor.cpp
g++ -D__linux__ -O3 -c -o saxon/XQueryProcessor.o saxon/Saxon.C.API/XQueryProcessor.cpp
g++ -D__linux__ -O3 -c -o saxon/Xslt30Processor.o saxon/Saxon.C.API/Xslt30Processor.cpp
g++ -D__linux__ -O3 -c -o saxon/XsltProcessor.o saxon/Saxon.C.API/XsltProcessor.cpp
>brex.h; for f in brex/DMC-AE-A-04-10-0301-00A-022A-D_003-00.XML brex/DMC-S1000D-A-04-10-0301-00A-022A-D_004-00_EN-US.XML brex/DMC-S1000D-E-04-10-0301-00A-022A-D_009-00_EN-US.XML brex/DMC-S1000D-F-04-10-0301-00A-022A-D_001-00_EN-US.XML brex/DMC-S1000D-G-04-10-0301-00A-022A-D_001-00_EN-US.XML stats.xsl; do xxd -i "$f" >>brex.h; done
cc -Wall -Werror -pedantic-errors -I ../common `pkg-config --cflags libxml-2.0 libxslt libexslt` -O3 -DUSE_SAXON -o s1kd-brexcheck s1kd-brexcheck.c ../common/s1kd_tools.c saxon/saxon.o saxon/SaxonCGlue.o saxon/SaxonCXPath.o saxon/SaxonProcessor.o saxon/SchemaValidator.o saxon/XdmAtomicValue.o saxon/XdmItem.o saxon/XdmNode.o saxon/XdmValue.o saxon/XPathProcessor.o saxon/XQueryProcessor.o saxon/Xslt30Processor.o saxon/XsltProcessor.o `pkg-config --libs libxml-2.0 libxslt libexslt` -lstdc++ -ldl
I get the same errors. I'm going to keep looking in to them.
I tried a different approach by modifying the Saxon/C API to link with the libsaxonhec library at compile-time, instead of dynamically loading it at run-time from a hard-coded path (3b80c66). I am now able to successfully run it on Windows.
The pre-release 3.2.5 contains an extra s1kd-brexcheck_4.4.4_win64-saxon.zip package, which is a standalone distribution of s1kd-brexcheck with Saxon enabled.
I forgot to include the JET runtime files in the above package. Use s1kd-brexcheck_4.4.5_win64-saxon.zip instead.
Some updates:
I'm still investigating the memory leak issues with Saxon/C. I reported what I found on their development tracker, and they are looking into it as well: https://saxonica.plan.io/boards/4/topics/7982.
I found another open source XPath 2.0 implementation, XQilla. It seems to be faster than Saxon and with a smaller footprint, plus no memory leaks. Unfortunately, I haven't had much luck finding or building Windows binaries for it, so I've only gotten it working on Linux thus far.
BREXCHECK_XPATH_ENGINE
can now have the following values:
LIBXML
(default)SAXON
XQILLA
I finally did manage to build XQilla on Windows with Cygwin, and I created a standalone Windows package for testing: s1kd-brexcheck_4.8.0_win64-xqilla.zip.
I finally did manage to build XQilla on Windows with Cygwin, and I created a standalone Windows package for testing: s1kd-brexcheck_4.8.0_win64-xqilla.zip.
Hi, the new s1kd-brexcheck.exe doesn't give any error messages, so it appears to work... But it also doesn't detect any mistakes in the datamodules.
For example I made some clear mistakes regarding the securityClassification, systemDiffCode and modelIdentCode. And none are detected with the new version of s1kd-brexcheck.exe. With the old version they are detected.
Good catch! This was an issue with how Xerces-C implements attribute nodes. DOMNode::getParentNode() doesn't return the element they are on, DOMAttr::getOwnerElement() must be used instead. This should be fixed in 022c40a, s1kd-brexcheck_4.8.2_win64-xqilla.zip.
Yes, I confirm now it works and it detects the errors.
Great! Thank you for testing it.
530216e is an overhaul on the XPath 2.0 functionality, so that it's not an either-or choice, but is based on what version of XPath was referenced by the corresponding issue of S1000D for each BREX. If the BREX is issue 3.0 and lower, it will use XPath 1.0 (with libxml), and if it is issue 4.0 and up, it will use XPath 2.0 (with either Saxon or XQilla). You can override this with --xpath-version to force it to use either version regardless of the issue of the BREX.
I changed the name of the Makefile variable to XPATH2_ENGINE
, with options NONE
, SAXON
or XQILLA
to reflect that you aren't replacing libxml anymore. The default will still be NONE
while I keep testing/improving the interfaces for the other options.
Here are the latest Windows builds:
Hi,
I have 2 huge BREX files, more than 8000 lines of code combined. And the s1kd-brexcheck fails with
ERROR: Invalid object path in BREX
Step by step I have commented the rules that were responsible for these errors. And found that there were only 4 rules that were causing such error.Is there any option to run s1kd-brexcheck and ignore such errors? More exactly, just skip the BREX rules that are "broken" (although they are not really broken, maybe xmlXPathEvalExpression is not 100% correct).
The rules that do not work: