readchina / ReadAct

Worlds of Reading during China's long 1970s
https://readchina.github.io/readact
Creative Commons Attribution Share Alike 4.0 International
8 stars 7 forks source link

Faulty validation error on CI #500

Closed duncdrum closed 2 years ago

duncdrum commented 2 years ago

Ever since ec63800a414a72919dd96f874e8e9d85fd528afc we have non-sensical validation errors on CI .

listOrg.xml:216: element org: Relax-NG validity error : Invalid attribute id for element org
[9](https://github.com/readchina/ReadAct/runs/6424321663?check_suite_focus=true#step:10:9)
listOrg.xml:7: element org: Relax-NG validity error : Did not expect element org there
[10](https://github.com/readchina/ReadAct/runs/6424321663?check_suite_focus=true#step:10:10)
listOrg.xml:7: element org: Relax-NG validity error : Element listOrg has extra content: org

I can't reproduce these validation errors locally, the files in question validate using

xmllint --version                                                                                                                                                                               
xmllint: using libxml version 20904
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude ICU ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib 

jing, saxon, and whatever else i can throw at it.

manually inspecting the files i see no reason why the 2.0 output should be invalid. Obviously listOrg can have direct org children, their xml:id are unique and valid NCNames, and the org at L7 has not been changed between v1.0 and v2.0.

  <org xml:id="AG0628">
    <orgName xml:lang="en" type="main" from="1960" to="1970">Beatles</orgName>
    <placeName ref="#SP0134"/>
  </org>

I've also inspected 4.3.0 and 4.4.0 versions of the TEI schema files, and the rules in question are all as they should be.

CI is using libxml version 20910 so there seems to be a regression, leading to the false validation errors.

Sadly this remains a showstopper for a new release.

this issue seems to experience similar problems https://gitlab.gnome.org/GNOME/libxml2/-/issues/223

This needs some further investigation, and experimenting. We might have a corrupt cache, an actual error in the new file hidden behind false log output, or we might need to switch validation method.