Closed gjactat closed 1 year ago
Thx for this. We are using xmlParseMemory in most placesm so I think it's better to fix the decoder methods. All SRU/SRU/SOlr decoder methods should do be able to,
In fact, xmlParseMemory is deprecated in favor of xmlReadMemory. I think I'll keep this change on my Yaz cloned repo. I'm not confident I could spot each and every line where the decoding could fail because of blanks nodes.
In fact, xmlParseMemory is deprecated in favor of xmlReadMemory. I think I'll keep this change on my Yaz cloned repo. I'm not confident I could spot each and every line where the decoding could fail because of blanks nodes.
But you are using the method with options (which is not enabled by default) ... In particular the original "document" from Solr would have whitespace removed. In other words, to fix a problem with decoding of facets you ALSO remove whitespace from the document.
Closing as decode methods are fixed in #93
In fact, xmlParseMemory is deprecated in favor of xmlReadMemory. I think I'll keep this change on my Yaz cloned repo. I'm not confident I could spot each and every line where the decoding could fail because of blanks nodes.
But you are using the method with options (which is not enabled by default) ... In particular the original "document" from Solr would have whitespace removed. In other words, to fix a problem with decoding of facets you ALSO remove whitespace from the document.
To be clear, "XML_PARSE_NOBLANKS" does not remove all spaces from the document (which would be quite useless). It just ignores the empty text nodes induced by line feeds between xml element nodes. The C# XDocument parser ignores them by default.
Thank you for your changes #93. I have tested your changes and they fix the issue. Guillaume
In fact, xmlParseMemory is deprecated in favor of xmlReadMemory. I think I'll keep this change on my Yaz cloned repo. I'm not confident I could spot each and every line where the decoding could fail because of blanks nodes.
But you are using the method with options (which is not enabled by default) ... In particular the original "document" from Solr would have whitespace removed. In other words, to fix a problem with decoding of facets you ALSO remove whitespace from the document.
To be clear, "XML_PARSE_NOBLANKS" does not remove all spaces from the document (which would be quite useless). It just ignores the empty text nodes induced by line feeds between xml element nodes. The C# XDocument parser ignores them by default.
Yep. But it's desirable to keep the whitespaces for readability.. the Solr did exactly that.
Thank you for your changes #93. I have tested your changes and they fix the issue. Guillaume
Welcome.
FWIW, even with XML_PARSE_NOBLANKS in use one could easily construct a "hostile" response , for example by populating foreign elements not known by the Solr decoder.
Eg
<lst name="date">
<int name="1978">5</int>
</lst>\n"
<foreign>x</foreign>
This is not happening for a well-behaved Solr server of course. But it's foreign input and so must be considered.
We encountered errors while querying a Solr server through Pazpar2. The problem was linked to empty nodes not being handled in the "yaz_solr_decode_facet_field" method.
Instead of applying the same recipe as found in other methods (testing node->type == XML_ELEMENT_NODE), I've tried to load the document with the XML_PARSE_NOBLANKS option).
This option removes the empty text nodes that happens when the Solr XML response is indented (line feeds between elements).
Of course, it's up to you to decide what is safer (testing nodes types or loading without blanks). But anyway, we had errors that crashed Pazpar2 with the following Solr server : https://api.archives-ouvertes.fr/search/
An example of Solr response that caused issues can be found here.