isaacs / sax-js

A sax style parser for JS
Other
1.09k stars 325 forks source link

XML within XML being parsed #251

Closed bolte-io closed 2 years ago

bolte-io commented 2 years ago

Hey guys, unsure if this is part of the problem but might be. I am using module xml-js which I believe uses SAX as the parser dependency, all the other parsers that use sax have this same issue so my belief is that the issue is with SAX

I am I guess a unique use case here. Here is what I am trying to do.

  1. Parse the XML into a JS object (no issues here)
  2. Extract the unmodified XML from a string.

Currently, the parsed XML cannot be converted back to a string that matches the original. What seems to happen is the XML is extracted from inside the sample file below, this is unintended in my use case. Here is a sample of the file I am parsing:

<script>
<scriptversion>
1.3
</scriptversion>

<data file="data\ships\shiparch.ini" method="append"> 
<source>
[Ship]
ids_name = 0 ;GENERATESTRRES("Heavy Lifter")
ids_info = 0 ;GENERATEXMLRES("Info1")
ids_info1 = 0 ;GENERATEXMLRES("Info2")
ids_info2 = 66608
ids_info3 = 0 ;GENERATEXMLRES("Info3")
ship_class = 2
nickname = ge_lifter2
</source>
</data>

<data file="data\ships\ships.ini" method="replace"> 
<source>
[Ship]
ids_name = 0 ;GENERATESTRRES("Heavy Lifter")
ids_info = 0 ;GENERATEXMLRES("<xml><RDL><PUSH/><TEXT> </TEXT><PARA/><TRA data="1" mask="1" def="-2"/><JUST loc="center"/><TEXT>Stats</TEXT><PARA/><TRA data="0" mask="1" def="-1"/><JUST loc="left"/><TEXT> </TEXT><PARA/><TEXT>Gun/Turret Mounts: 0/5</TEXT><PARA/><TEXT>Armor: 15000</TEXT><PARA/><TEXT>Cargo Space: 400</TEXT><PARA/><TEXT>Max Batteries/Repair Kits: 100/100</TEXT><PARA/><TEXT>Optimal Weapon Class: NA</TEXT><PARA/><TEXT>Max. Weapon Class: NA</TEXT><PARA/><TEXT>Add'l Equipment: NONE</TEXT><PARA/><PARA/><POP/></RDL></xml>")
ids_info1 = 0 ;GENERATESTRRES("Info2")
ids_info2 = 0 ;GENERATESTRRES("Info3")
ids_info3 = 0 ;GENERATEXMLRES("Info4")
ship_class = 2
nickname = ge_lifter2
</source>
</data>

</script>

When parsing, it also parses the XML contained within GENERATEXMLRES("") I would like it to avoid doing this.

Where/how in the code would it be possible to achieve this? Would it be adding ; as a comment declaration somewhere? I would really love if someone could help me here if you have the time.

Thanks!

bolte-io commented 2 years ago

Closing, a bit of rubber duck debugging helped me to solve my issue.

isaacs commented 2 years ago

For posterity, the pedantically correct way to do this is with a <![CDATA[ block. Something like:

<script>
<data file="filename">
<source><![CDATA[
anything at all
even containing <xml> stuff </xml>
]]></source>
</data>
</script>