Closed ronaldtse closed 2 years ago
I had different issues with Nokogiri 1.13 (it not dealing with idiosyncratic Word HTML), which I resolved through postprocessing: https://github.com/metanorma/html2doc/issues/69
I think you're going to need a preprocessing step, to escape those characters if standalone, before passing them to the to_xml
constructor in every field. (And xml.text(...)
sounds like such a preprocessor.)
The
to_xml
method in some Relaton flavor gems seems to be losing the&
character in text (according to @andrew2net ).
It's fixed in the realton-bib v 1.11.5.
To escape these special symbols it needs to use the method xml.text(...)
.
RFC allows HTML tags inside an abstract
element. To parse the element it needs to use the method xml.at('abstract').inner_html
.
In Nokogiri v1.13 the treatment of the
&
,<
and>
signs seems to have changed (correctly). These are invalid XML symbols as content.This is described in: https://github.com/sparklemotion/nokogiri/issues/2483
Right now any text with
&
,<
,>
, etc., will be rendered without the symbols.It is said that the
text
method should be used to encode text with these invalid XML characters:The
to_xml
method in some Relaton flavor gems seems to be losing the&
character in text (according to @andrew2net ).Other than this location: https://github.com/relaton/relaton-bib/blob/cae6e9fd7598f560d14ee94623e7f210e2ab7ac1/lib/relaton_bib/bibliographic_item.rb#L325-L333
There are not many places that define
def to_xml
(108): https://github.com/search?l=Ruby&q=org%3Arelaton+%22def+to_xml%22&type=CodeEven less wth "abstract" (42): https://github.com/search?l=Ruby&q=org%3Arelaton+%22abstract%22&type=Code
@andrew2net is currently investigating which flavor which document this problem originated from.
This task is to add a test for that and fix the behavior.