xsdata failing to generate models for XJustiz XSDs - "missing closing quote in string literal"

phdowling commented 1 month ago

Apologies in advance for exposing you to the convolution that is the German XJustiz data format. Nevertheless, I ran into this issue today: the model generation fails for all of the schemas here (select any version and click on XSD Schemata to download one, for example this one).

For example, when I run xsdata xjustiz_0400_register_3_3.xsd this happens:

========= xsdata v24.5 / Python 3.12.0 / Platform darwin =========

Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0400_register_3_3.xsd
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/din-norm-91379-datatypes.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/din-norm-91379-datatypes.xsd
Builder: 5 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0000_grunddatensatz_3_5.xsd
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0010_cl_allgemein_3_6.xsd
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xoev-code.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xoev-code.xsd
Builder: 1 main and 0 inner classes
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0010_cl_allgemein_3_6.xsd
Builder: 60 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0020_cl_gerichte_3_3.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0020_cl_gerichte_3_3.xsd
Builder: 1 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0030_cl_rechtsform_3_3.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0030_cl_rechtsform_3_3.xsd
Builder: 1 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0040_cl_rollenbezeichnung_3_5.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0040_cl_rollenbezeichnung_3_5.xsd
Builder: 1 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0050_cl_staaten_3_2.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0050_cl_staaten_3_2.xsd
Builder: 1 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0060_cl_telekommunikation_3_0.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0060_cl_telekommunikation_3_0.xsd
Builder: 4 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0070_cl_justizvollzugsanstalt_3_1.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0070_cl_justizvollzugsanstalt_3_1.xsd
Builder: 1 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0080_cl_register_3_2.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0080_cl_register_3_2.xsd
Builder: 3 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0095_cl_personalstatut_3_0.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0095_cl_personalstatut_3_0.xsd
Builder: 2 main and 0 inner classes
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0000_grunddatensatz_3_5.xsd
Builder: 45 main and 46 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0410_cl_register_3_1.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0410_cl_register_3_1.xsd
Builder: 5 main and 0 inner classes
Parsing schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0420_cl_vertretung_register_3_0.xsd
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0420_cl_vertretung_register_3_0.xsd
Builder: 5 main and 0 inner classes
Compiling schema file:///Users/dowling/Downloads/XJustiz_3.5.1%20XSD/xjustiz_0400_register_3_3.xsd
Builder: 1 main and 36 inner classes
Analyzer input: 136 main and 82 inner classes
Analyzer output: 128 main and 82 inner classes
Generating package: init
Generating package: generated.xoev_code
Generating package: generated.xjustiz_0010_cl_allgemein_3_6
Generating package: generated.xjustiz_0020_cl_gerichte_3_3
Generating package: generated.xjustiz_0030_cl_rechtsform_3_3
Generating package: generated.xjustiz_0040_cl_rollenbezeichnung_3_5
Generating package: generated.xjustiz_0050_cl_staaten_3_2
Generating package: generated.xjustiz_0060_cl_telekommunikation_3_0
Generating package: generated.xjustiz_0070_cl_justizvollzugsanstalt_3_1
Generating package: generated.xjustiz_0080_cl_register_3_2
Generating package: generated.xjustiz_0095_cl_personalstatut_3_0
=========
Error: Ruff failed
details: Failed to parse generated/xjustiz_0000_grunddatensatz_3_5.py:89:24: missing closing quote in string literal
source: 

           default=None,
           metadata={
               "type": "Element",
               "namespace": "http://www.xjustiz.de",
--->            "pattern": r"([ -
|[ -~]|[ -¬]|[®-ž]|[Ƈ-ƈ]|Ə|Ɨ|[Ơ-ơ]|[Ư-ư]|Ʒ|[Ǎ-ǜ]|[Ǟ-ǟ]|[Ǣ-ǰ]|[Ǵ-ǵ]|[Ǹ-ǿ]|[Ȓ-ȓ]|[Ș-ț]|[Ȟ-ȟ]|[ȧ-ȳ]|ə|ɨ|ʒ|[ʹ-ʺ]|[ʾ-ʿ]|ˈ|ˌ|[Ḃ-ḃ]|[Ḇ-ḇ]|[Ḋ-ḑ]|ḗ|[Ḝ-ḫ]|[ḯ-ḷ]|[Ḻ-ḻ]|[Ṁ-ṉ]|[Ṓ-ṛ]|[Ṟ-ṣ]|[Ṫ-ṯ]|[Ẁ-ẇ]|[Ẍ-ẗ]|ẞ|[Ạ-ỹ]|’|‡|€|A̋|C(̀|̄|̆|̈|̕|̣|̦|̨̆)|D̂|F(̀|̄)|G̀|H(̄|̦|̱)|J(́|̌)|K(̀|̂|̄|̇|̕|̛|̦|͟H|͟h)|L(̂|̥|̥̄|̦)|M(̀|̂|̆|̐)|N(̂|̄|̆|̦)|P(̀|̄|̕|̣)|R(̆|̥|̥̄)|S(̀|̄|̛̄|̱)|T(̀|̄|̈|̕|̛)|U̇|Z(̀|̄|̆|̈|̧)|a̋|c(̀|̄|̆|̈|̕|̣|̦|̨̆)|d̂|f(̀|̄)|g̀|h(̄|̦)|j́|k(̀|̂|̄|̇|̕|̛|̦|͟h)|l(̂|̥|̥̄|̦)|m(̀|̂|̆|̐)|n(̂|̄|̆|̦)|p(̀|̄|̕|̣)|r(̆|̥|̥̄)|s(̀|̄|̛̄|̱)|t(̀|̄|̕|̛)|u̇|z(̀|̄|̆|̈|̧)|Ç̆|Û̄|ç̆|û̄|ÿ́|Č(̕|̣)|č(̕|̣)|ē̍|Ī́|ī́|ō̍|Ž(̦|̧)|ž(̦|̧)|Ḳ̄|ḳ̄|Ṣ̄|ṣ̄|Ṭ̄|ṭ̄|Ạ̈|ạ̈|Ọ̈|ọ̈|Ụ(̄|̈)|ụ(̄|̈))*",
           }
       )

From a quick look around, I would guess the issue has to do with the long sequences of special characters in din-norm-91379-datatypes.xsd, e.g. starting on line 29:

<xs:restriction base="xs:string">
         <xs:pattern value="(&#x0020;|&#x0027;|[&#x002C;-\&#x002E;]|[&#x0041;-&#x005A;]|[&#x0060;-&#x007A;]|&#x007E;|&#x00A8;|&#x00B4;|&#x00B7;|[&#x00C0;-&#x00D6;]|[&#x00D8;-&#x00F6;]|[&#x00F8;-&#x017E;]|[&#x0187;-&#x0188;]|&#x018F;|&#x0197;|[&#x01A0;-&#x01A1;]|[&#x01AF;-&#x01B0;]|&#x01B7;|[&#x01CD;-&#x01DC;]|[&#x01DE;-&#x01DF;]|[&#x01E2;-&#x01F0;]|[&#x01F4;-&#x01F5;]|[&#x01F8;-&#x01FF;]|[&#x0212;-&#x0213;]|[&#x0218;-&#x021B;]|[&#x021E;-&#x021F;]|[&#x0227;-&#x0233;]|&#x0259;|&#x0268;|&#x0292;|[&#x02B9;-&#x02BA;]|[&#x02BE;-&#x02BF;]|&#x02C8;|&#x02CC;|[&#x1E02;-&#x1E03;]|[&#x1E06;-&#x1E07;]|[&#x1E0A;-&#x1E11;]|&#x1E17;|[&#x1E1C;-&#x1E2B;]|[&#x1E2F;-&#x1E37;]|[&#x1E3A;-&#x1E3B;]|[&#x1E40;-&#x1E49;]|[&#x1E52;-&#x1E5B;]|[&#x1E5E;-&#x1E63;]|[&#x1E6A;-&#x1E6F;]|[&#x1E80;-&#x1E87;]|[&#x1E8C;-&#x1E97;]|&#x1E9E;|[&#x1EA0;-&#x1EF9;]|&#x2019;|&#x2021;|&#x0041;&#x030B;|&#x0043;(&#x0300;|&#x0304;|&#x0306;|&#x0308;|&#x0315;|&#x0323;|&#x0326;|&#x0328;&#x0306;)|&#x0044;&#x0302;|&#x0046;(&#x0300;|&#x0304;)|&#x0047;&#x0300;|&#x0048;(&#x0304;|&#x0326;|&#x0331;)|&#x004A;(&#x0301;|&#x030C;)|&#x004B;(&#x0300;|&#x0302;|&#x0304;|&#x0307;|&#x0315;|&#x031B;|&#x0326;|&#x035F;&#x0048;|&#x035F;&#x0068;)|&#x004C;(&#x0302;|&#x0325;|&#x0325;&#x0304;|&#x0326;)|&#x004D;(&#x0300;|&#x0302;|&#x0306;|&#x0310;)|&#x004E;(&#x0302;|&#x0304;|&#x0306;|&#x0326;)|&#x0050;(&#x0300;|&#x0304;|&#x0315;|&#x0323;)|&#x0052;(&#x0306;|&#x0325;|&#x0325;&#x0304;)|&#x0053;(&#x0300;|&#x0304;|&#x031B;&#x0304;|&#x0331;)|&#x0054;(&#x0300;|&#x0304;|&#x0308;|&#x0315;|&#x031B;)|&#x0055;&#x0307;|&#x005A;(&#x0300;|&#x0304;|&#x0306;|&#x0308;|&#x0327;)|&#x0061;&#x030B;|&#x0063;(&#x0300;|&#x0304;|&#x0306;|&#x0308;|&#x0315;|&#x0323;|&#x0326;|&#x0328;&#x0306;)|&#x0064;&#x0302;|&#x0066;(&#x0300;|&#x0304;)|&#x0067;&#x0300;|&#x0068;(&#x0304;|&#x0326;)|&#x006A;&#x0301;|&#x006B;(&#x0300;|&#x0302;|&#x0304;|&#x0307;|&#x0315;|&#x031B;|&#x0326;|&#x035F;&#x0068;)|&#x006C;(&#x0302;|&#x0325;|&#x0325;&#x0304;|&#x0326;)|&#x006D;(&#x0300;|&#x0302;|&#x0306;|&#x0310;)|&#x006E;(&#x0302;|&#x0304;|&#x0306;|&#x0326;)|&#x0070;(&#x0300;|&#x0304;|&#x0315;|&#x0323;)|&#x0072;(&#x0306;|&#x0325;|&#x0325;&#x0304;)|&#x0073;(&#x0300;|&#x0304;|&#x031B;&#x0304;|&#x0331;)|&#x0074;(&#x0300;|&#x0304;|&#x0315;|&#x031B;)|&#x0075;&#x0307;|&#x007A;(&#x0300;|&#x0304;|&#x0306;|&#x0308;|&#x0327;)|&#x00C7;&#x0306;|&#x00DB;&#x0304;|&#x00E7;&#x0306;|&#x00FB;&#x0304;|&#x00FF;&#x0301;|&#x010C;(&#x0315;|&#x0323;)|&#x010D;(&#x0315;|&#x0323;)|&#x0113;&#x030D;|&#x012A;&#x0301;|&#x012B;&#x0301;|&#x014D;&#x030D;|&#x017D;(&#x0326;|&#x0327;)|&#x017E;(&#x0326;|&#x0327;)|&#x1E32;&#x0304;|&#x1E33;&#x0304;|&#x1E62;&#x0304;|&#x1E63;&#x0304;|&#x1E6C;&#x0304;|&#x1E6D;&#x0304;|&#x1EA0;&#x0308;|&#x1EA1;&#x0308;|&#x1ECC;&#x0308;|&#x1ECD;&#x0308;|&#x1EE4;(&#x0304;|&#x0308;)|&#x1EE5;(&#x0304;|&#x0308;))*"/>

Any help would be greatly appreciated.

rbruenig commented 1 month ago

It looks to me like the pattern in the generated code is wrong. Pasting the restriction source into an html decoder gives the following output

<xs:restriction base="xs:string">
         <xs:pattern value="( |'|[,-\.]|[A-Z]|[`-z]|~|¨|´|·|[À-Ö]|[Ø-ö]|[ø-ž]|[Ƈ-ƈ]|Ə|Ɨ|[Ơ-ơ]|[Ư-ư]|Ʒ|[Ǎ-ǜ]|[Ǟ-ǟ]|[Ǣ-ǰ]|[Ǵ-ǵ]|[Ǹ-ǿ]|[Ȓ-ȓ]|[Ș-ț]|[Ȟ-ȟ]|[ȧ-ȳ]|ə|ɨ|ʒ|[ʹ-ʺ]|[ʾ-ʿ]|ˈ|ˌ|[Ḃ-ḃ]|[Ḇ-ḇ]|[Ḋ-ḑ]|ḗ|[Ḝ-ḫ]|[ḯ-ḷ]|[Ḻ-ḻ]|[Ṁ-ṉ]|[Ṓ-ṛ]|[Ṟ-ṣ]|[Ṫ-ṯ]|[Ẁ-ẇ]|[Ẍ-ẗ]|ẞ|[Ạ-ỹ]|’|‡|A̋|C(̀|̄|̆|̈|̕|̣|̦|̨̆)|D̂|F(̀|̄)|G̀|H(̄|̦|̱)|J(́|̌)|K(̀|̂|̄|̇|̕|̛|̦|͟H|͟h)|L(̂|̥|̥̄|̦)|M(̀|̂|̆|̐)|N(̂|̄|̆|̦)|P(̀|̄|̕|̣)|R(̆|̥|̥̄)|S(̀|̄|̛̄|̱)|T(̀|̄|̈|̕|̛)|U̇|Z(̀|̄|̆|̈|̧)|a̋|c(̀|̄|̆|̈|̕|̣|̦|̨̆)|d̂|f(̀|̄)|g̀|h(̄|̦)|j́|k(̀|̂|̄|̇|̕|̛|̦|͟h)|l(̂|̥|̥̄|̦)|m(̀|̂|̆|̐)|n(̂|̄|̆|̦)|p(̀|̄|̕|̣)|r(̆|̥|̥̄)|s(̀|̄|̛̄|̱)|t(̀|̄|̕|̛)|u̇|z(̀|̄|̆|̈|̧)|Ç̆|Û̄|ç̆|û̄|ÿ́|Č(̕|̣)|č(̕|̣)|ē̍|Ī́|ī́|ō̍|Ž(̦|̧)|ž(̦|̧)|Ḳ̄|ḳ̄|Ṣ̄|ṣ̄|Ṭ̄|ṭ̄|Ạ̈|ạ̈|Ọ̈|ọ̈|Ụ(̄|̈)|ụ(̄|̈))*"/>

Looks like something is converted wrong and generates a newline character, which causes ruff to complain on the unterminated string.

tefra commented 1 month ago

Thanks for reporting @rbruenig the fix is on main.

The pattern is not used during parsing, it's rendered only as information for the developer, at some point I am hoping I can translate xsd patterns to python regex.

You can also disable rendering them completely, see --ignore-patterns

tefra / xsdata

xsdata failing to generate models for XJustiz XSDs - "missing closing quote in string literal" #1054