zotero / citeproc-rs

CSL processor in Rust.
https://cormacrelf.github.io/citeproc-wasm-demo/
Other
73 stars 11 forks source link

"<" is ignored in prefix #129

Closed tnajdek closed 3 years ago

tnajdek commented 3 years ago

Some styles use "<" character in prefix (encoded as &lt;). This seems to be ignored and doesn't render.

cormacrelf commented 3 years ago

Prefixes of what kinds of things? I can't reproduce this on a <text prefix="&lt;" variable="title"> or URL, but there could be another element that has this problem. (It occurs to me I should make it so the demo/playground can give you shareable links.)

image

tnajdek commented 3 years ago

Sorry, it was quite late here when I've seen this and I just wanted to note down the issue so I don't forget.

I've seen this behaviour for DOI, it seems to happen regardless of link_anchors value. Here is an example CSL (from MHRA):

<text variable="DOI" prefix=" &lt;https://doi.org/" suffix="&gt;"/>

And here is screenshot from the playground:

Screenshot 2021-10-30 at 11 06 16
cormacrelf commented 3 years ago

Oh, I see it. It is over-parsing the affixes, because the superscript parser is only used inside the HTML parser. It needs to parse the hacky superscripts, but not actual HTML. The HTML5 parser used in citeproc-rs dutifully (and I presume correctly according to that spec) ignored an incomplete/invalid <https: tag and swallowed the rest of the input.

The affixes have their XML entities pre-processed, so that part is already done, the HTML parser doesn't need to be involved at all.

cormacrelf commented 3 years ago

This is a good catch, thanks.