uogbuji / amara3-xml

A data processing library built on Python 3 and MicroXML
Apache License 2.0
10 stars 3 forks source link

uxml.writer not escaping attribute cdata? #19

Closed distobj closed 3 years ago

distobj commented 3 years ago

I wrote some code using amara3.uxml to modify MARCXML records. I thought I'd be able to use a xmlter.sender coroutine for input and write it out with uxml.writer losslessly. That's not happening though, when it encounters character references on input, specifically the quot character reference in this case, it gets turned into a quote character, producing non-well-formed output.

Here's a script and some input data to reproduce

uogbuji commented 3 years ago

@distobj I heard via backchannel that you need an urgent fix? I had not been made aware of that. The above commit on this hotfix branch seems to work with your example, but is not fully tested. It will take a fair bit more effort before it is, in order to make sure we're finally developing in a mature manner. But if it's a blocker, you can ry cherry-picking this patch to see if it gets you over the hump.

uogbuji commented 3 years ago

Quick update on this. Working on turning the sample files into a test case I ran into a problem with & /. & double-escaping. Hoping to have a hotfix as soon as I can track that down, then a release today, or at least before Monday.

uogbuji commented 3 years ago

OK fixed, tested & released, viz PyPI.