rbeyer / pds4validate

Python-based validator for PDS4 XML labels.
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

FYI for existing validation methods (not an issue) #1

Open thareUSGS opened 2 years ago

thareUSGS commented 2 years ago

I ran across this repo looking at hiproc. In general, I do simply recommend using the PDS Engineering supported validate routine. Not only does it check validation based using PDS4 schemas, but also has support for Schematron v2 rules (rare to find outside of commercial applications like Oxygen). Lastly, it has some extra validation for PDS4 and actually can look into the data files for simple testing (image or tables). Now that check can be turned off to speed things up.

Anyway, I have an old email (circa 2017) from Even R. when he first implemented PDS4 in GDAL which shows a trick method in GDAL to provide simple validation (not full Schematron v2 though). But he also located a free method to help with Schematron v2 as listed in the second section below. I have never tested that method and it seems like using several schema files would be problematic as the bash script is currently written. Anyway, if you are running JAVA for saxonb-xslt, you might as well run the PDS4 validate app.

[Trent] We don't plan for GDAL to have any sort of schema validation method, but what application do you use?


[Even] You can validate any XML document that references its schemas, like a PDS4 .xml label, using (or perhaps abusing !) the new GMLAS driver

For example:

ogrinfo GMLAS:PDS4_label_test.xml -oo VALIDATE=YES >/dev/null

windoze:

ogrinfo GMLAS:PDS4_label_test.xml -oo VALIDATE=YES > NUL

The first run will take some time, downloading schemas and caching them in ~/.gdal/gmlas_xsd_cache. If the above command doesn't output anything, then the document validates. Underneath this is the Xerces-C validator which is used.


[Even] Note that Schematron validation is not done. As far as I could find, for open source software, there are only Java based solutions that can deal with Schematron v2.

On Ubuntu 16+,

sudo apt-get install libsaxonb-java

rbeyer commented 2 years ago

This is really just a doodle at this point. It was part of an experiment to see if I could produce and validate PDS XML without Java being in the toolchain (because I really just don't like Java). So far I can do everything but the validation pretty easily.

I found SaxonC-HE, written in C (but it might link to a pacakged Java run-time, so maybe not as clean anyway), which has Python bindings and does perform Schematron v2 validation, but it is real janky to get set up, and the documentation isn't super. I had it briefly running last summer under a previous version, but the newest version changed the API and my little experiment is now broken. Until I have the time (ha!) to get it running again, there's no point in pushing the code up, and this repo will remain a stub.