Closed robert-dodier closed 1 year ago
I could look into this. Would you mind sharing a document that has the characteristics you refer to, and a corresponding call to xmls:parse
.
I don't have a ton of time for XMLS RN, so setting me up to fix this quickly would be the difference between me doing it and postponing indefinitely...
Thanks for your response. I looked at it a little more and I have a proposed patch below.
Examples of unexpected behavior:
(xmls:parse (make-string-input-stream ""))
or
(xmls:parse (make-string-input-stream " "))
both provoke TYPE-ERROR, "The value NIL is not of type CHARACTER".
A stack trace shows the error is in MATCH (called from DOCUMENT). On looking at MATCH, maybe this patch is enough to enable the desired behavior?
$ diff -u xmls.lisp-original xmls.lisp
--- xmls.lisp-original 2023-02-19 12:47:23.315242150 -0800
+++ xmls.lisp 2023-02-19 12:47:52.239385577 -0800
@@ -367,7 +367,7 @@
(defmacro match (&rest matchers)
"Attempts to match the next input character with one of the supplied matchers."
`(let ((c (peek-stream (state-stream s))))
- (and
+ (and c
(or ,@(loop for m in matchers
collect (etypecase m
(standard-char `(char= ,m c))
I find that with that patch, the two preceding examples both return NIL as expected. Furthermore this document:
<a>1</a>
<b>2</b>
<c>3</c>
<c>4</c>
<c>5</c>
<c>6</c>
(let's call it /tmp/foo.xml
) gives the following output:
* (with-open-file (s "/tmp/foo.xml") (let (x) (loop while (setq x (xmls:parse s)) do (print x))))
#S(XMLS:NODE :NAME "a" :NS NIL :ATTRS NIL :CHILDREN ("1"))
#S(XMLS:NODE :NAME "b" :NS NIL :ATTRS NIL :CHILDREN ("2"))
#S(XMLS:NODE :NAME "c" :NS NIL :ATTRS NIL :CHILDREN ("3"))
#S(XMLS:NODE :NAME "c" :NS NIL :ATTRS NIL :CHILDREN ("4"))
#S(XMLS:NODE :NAME "c" :NS NIL :ATTRS NIL :CHILDREN ("5"))
#S(XMLS:NODE :NAME "c" :NS NIL :ATTRS NIL :CHILDREN ("6"))
NIL
again as expected.
Finally I find that perl run-tests.pl --quicklisp
reports
Did 11 checks.
Pass: 11 (100%)
Skip: 0 ( 0%)
Fail: 0 ( 0%)
What do you think of the patch? Thanks for your consideration.
The biggest question about this patch in my mind is what should happen when you ask XMLS to parse an empty document. Should it be an error or should it simply return nil?
If the answer is the former, then we should modify your patch so that the programmer must ask for the non-standard behavior. If it's the latter, then we can accept your patch as is.
I'm inclined to suggest we make the programmer request non-standard behavior explicitly so that we don't give odd behavior to people who are trying to parse only standard XML documents.
(Honestly, they should probably be using a better XML parser than this one!)
I understand the reticence to implement nonstandard behavior, and I usually try to conform to standards myself, if there is one. However, I'm going to argue for a variance this one time.
I know these are not really convincing arguments, but I hope, anyway, it's enough to tilt the table.
Your arguments are good ones: XMLS is quite rudimentary and does not claim to do much, if any, checking.
OK, I will see about merging the patch. Will you please try to attach a standard git patch (git format-patch
) instead of diffing two separate files?
I'm working with documents which contain multiple root elements. I recognize that such documents are not valid XML, since a valid document contains exactly one root element.
However, I find by experiment that XMLS is happy with reading from a stream containing such documents, and one element is returned for each call to XMLS:PARSE. So far, so good.
However, at the end, when there are no more elements, and the stream contains at most white space, or possibly nothing, XMLS:PARSE runs into an error (from the Lisp implementation, SBCL at the moment), namely:
The value NIL is not of type CHARACTER
I wonder if there is a way to convince XMLS:PARSE to just return NIL in that case. The readme says XMLS:PARSE "returns nil if parsing fails" so I wonder if the end of file handling is not working as expected.