ikorin24 / U8XmlParser

Extremely fast UTF-8 xml parser library
MIT License
95 stars 13 forks source link

SystemFormatException #17

Closed SebastianStehle closed 2 years ago

SebastianStehle commented 2 years ago

Hi,

We write a C# version of MJML rendering engine for HTML email. MJML is an XML language.

I am trying out your library, but when I open this file, I get an FormatException without any further details:

https://github.com/SebastianStehle/mjml-test/blob/main/TestRunner/ManyHeroes.mjml

SebastianStehle commented 2 years ago

I found the root cause:

It is these this kind of attributes

background-url=
          "https://cloud.githubusercontent.com/assets/1830348/15354890/1442159a-1cf0-11e6-92b1-b861dadf1750.jpg"
ikorin24 commented 2 years ago

Thanks for the report. The current implementation was not intended to have spaces or line breaks before and after the attribute name and value.

In the following, the parser can read 'node1', but the others fail. 'node2' does not throw exceptions, but it contain whitespace in the attribute name. (Its attribute name is incorrectly "name ", NOT "name")

<root>
  <node1 name="value" />

  <node2 name ="value" />
  <node3 name= "value" />
  <node4 name=
    "value" />
</root>

Is this whitespace allowed by the xml specification? It reads correctly in some other parsers, but I don't know if this should be allowed.

SebastianStehle commented 2 years ago

I have run the example through a few online validators and none of them has marked any errors. So I guess it is correct to have whitespaces.

If I remember correctly it was formatted by VSCode or Visual Studio like that to prevent long lines.

ikorin24 commented 2 years ago

OK, I will fix that.

SebastianStehle commented 2 years ago

Thank you