In #73, a change was made to recognize special hex code escape sequence of the format x, e.g. _x000D_ for carriage return characters.
However, the code to recognize these character squences is using the regular expression
HEX_ESCAPE_REGEXP = /_x[0-9A-Za-z]{4}_/
which finds more than only hex sequences.
This was causing a problem for me with a spreadsheet that somewhere contained the (unescaped) string "_xhtml_". creek replaced it with a NUL byte (\0), which was causing errors in my application.
An example for a sharedStrings.xml (created manually with LibreOffice) to reproduce the issue:
`<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Test Case NameDescriptionStep InstructionsExpected ResultsAB
CA_xhtml_BDA_x005F_x000D_B`
I entered both "\_xhtml\_" and "\_x000D\_" manually in LibreOffice. You can see that in case of the second value it escaped the first underscore, but the "\_xhtml\_" string was not escaped (because it is not a hex value).
I guess you fix the issue by matching only
`/_x[0-9A-Fa-f]{4}_/ ?`
Thanks in advance!
In #73, a change was made to recognize special hex code escape sequence of the format x , e.g. _x000D_ for carriage return characters.
However, the code to recognize these character squences is using the regular expression
HEX_ESCAPE_REGEXP = /_x[0-9A-Za-z]{4}_/
which finds more than only hex sequences.This was causing a problem for me with a spreadsheet that somewhere contained the (unescaped) string "_xhtml_". creek replaced it with a NUL byte (\0), which was causing errors in my application.
An example for a sharedStrings.xml (created manually with LibreOffice) to reproduce the issue: `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>