sharplispers / cxml

Closure XML - A Common Lisp XML Parser
http://common-lisp.net/project/cxml/
Other
12 stars 5 forks source link

cxml looping (includes minimal reproducible case) #9

Open svenemtell opened 4 years ago

svenemtell commented 4 years ago

I have noticed that cxml is looping for a certain combination of xml/dtd. It loops in the recurse function inside compile-content-model in xml-parse.lisp. I have put a lot of time on trying to understand how the code is supposed to work but without success. It would be great if someone knowing the code could have a look at this!

A minimal reproducible case is available here https://doremir.com/files/cxml/xml-test.zip.

The xml file from the case looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a PUBLIC "" "">
<a>
  <c/>
</a>

And the dtd file looks like this:

<!ELEMENT b EMPTY>
<!ELEMENT c EMPTY>
<!ELEMENT a ((b?)*, c?)>

The real world case is from parsing MusicXML and the problematic xml excerpt looks like this:

<sound tempo="90">
  <offset sound="yes">8</offset>
</sound>

And the corresponding dtd excerpt looks like this: <!ELEMENT sound ((midi-device?, midi-instrument?, play?)*, offset?)>

ruricolist commented 4 years ago

I fixed this bug (I mean, this exact bug, down to the same minimal example) in FXML years ago:

https://github.com/ruricolist/FXML/commit/bcc3e0bda14e4e9b76dc9e9162dbd5964918fc1e

Unfortunately I had to rewrite compile-content-model in order to understand it, so the patch can't be applied directly, but the loop is due to left recursion. The fix is just a special variable to detect left recursion and break the loop.

svenemtell commented 4 years ago

Thanks @ruricolist! Good to hear that I'm not the only to find compile-content-model hard to understand :-). I made a pull request #10 which seems to fix the problem.