Open flavorjones opened 3 years ago
Also see xmlCtxtSetErrorHandler
available since libxml2 2.13.0.
Thanks for the pointer, @nwellnhof, that's great, and I'll make sure the error functions take the additional parser context argument so that I can polyfill.
Also note, the function names were changed in 04001e7c to be noko__structured_error_func_save_and_set
, etc.
In that commit I also did a bit of the described wrapping.
I don't think the polyfill approach will work here. Save-and-restore is the only option for older libxml2.
@nwellnhof Sorry, you're right, I used "polyfill" when I meant "shim", something like
void
noko__structured_error_func_save_and_set(
xmlParserCtxtPtr ctxt,
libxmlStructuredErrorHandlerState *handler_state,
void *user_data,
xmlStructuredErrorFunc handler
)
{
#ifdef HAVE_XMLCTXTSETERRORHANDLER
xmlCtxtSetErrorHandler(ctxt, handler, user_data);
#else
noko__structured_error_func_save(handler_state);
xmlSetStructuredErrorFunc(user_data, handler);
#endif
}
though now I see that there are flavors of this for XPath, XInclude, TestReader, and schemas, too ... in any case, we'll migrate towards these APIs over time for sure.
That would work but the main benefit is that xmlCtxtSetErrorHandler
only has to be called once on a parser context, preferably right after creating it.
See #2168 and #2169 for details, but the short version is that we should be more rigorous about saving-and-restoring error handlers and error handler metadata around libxml2 calls, in case any are being made recursively within Nokogiri.
For example, these lines in
Nokogiri::HTML::Document::EncodingReader
are calling HTML::SAX::PushParser to parse a chunk from the IO read callback of a regular document parse:https://github.com/sparklemotion/nokogiri/blob/7be6f04aa2700e818f8a3bfe82801b5bd6e8c4f4/lib/nokogiri/html/document.rb#L275-L277
To allow users to do similarly complex things, we should always save-and-restore the error callbacks (which are the only global state I can think of that we regularly manipulate).
We're doing this in the HTML::SAX::PushParser class to cover ourselves in the aforementioned case:
https://github.com/sparklemotion/nokogiri/blob/7be6f04aa2700e818f8a3bfe82801b5bd6e8c4f4/ext/nokogiri/html_sax_push_parser.c#L24-L28
https://github.com/sparklemotion/nokogiri/blob/7be6f04aa2700e818f8a3bfe82801b5bd6e8c4f4/ext/nokogiri/xml_syntax_error.h#L6-L17
https://github.com/sparklemotion/nokogiri/blob/7be6f04aa2700e818f8a3bfe82801b5bd6e8c4f4/ext/nokogiri/xml_syntax_error.c#L3-L24
This issue is opened to make sure we remember to do this everywhere.
It's somewhat related to wrapping we need to do around any libxml2 callbacks which re-enter the Ruby interpreter and how we handle those exceptions, all of which are detailed at #1610.