rpgoldman / xmls

Simple, lightweight XML library for Common Lisp
Other
20 stars 5 forks source link

Working with empty document #14

Closed robert-dodier closed 1 year ago

robert-dodier commented 1 year ago

I'm working with documents which contain multiple root elements. I recognize that such documents are not valid XML, since a valid document contains exactly one root element.

However, I find by experiment that XMLS is happy with reading from a stream containing such documents, and one element is returned for each call to XMLS:PARSE. So far, so good.

However, at the end, when there are no more elements, and the stream contains at most white space, or possibly nothing, XMLS:PARSE runs into an error (from the Lisp implementation, SBCL at the moment), namely: The value NIL is not of type CHARACTER

I wonder if there is a way to convince XMLS:PARSE to just return NIL in that case. The readme says XMLS:PARSE "returns nil if parsing fails" so I wonder if the end of file handling is not working as expected.

rpgoldman commented 1 year ago

I could look into this. Would you mind sharing a document that has the characteristics you refer to, and a corresponding call to xmls:parse.

I don't have a ton of time for XMLS RN, so setting me up to fix this quickly would be the difference between me doing it and postponing indefinitely...

robert-dodier commented 1 year ago

Thanks for your response. I looked at it a little more and I have a proposed patch below.

Examples of unexpected behavior:

(xmls:parse (make-string-input-stream ""))

or

(xmls:parse (make-string-input-stream "   "))

both provoke TYPE-ERROR, "The value NIL is not of type CHARACTER".

A stack trace shows the error is in MATCH (called from DOCUMENT). On looking at MATCH, maybe this patch is enough to enable the desired behavior?

$ diff -u xmls.lisp-original xmls.lisp
--- xmls.lisp-original  2023-02-19 12:47:23.315242150 -0800
+++ xmls.lisp   2023-02-19 12:47:52.239385577 -0800
@@ -367,7 +367,7 @@
 (defmacro match (&rest matchers)
   "Attempts to match the next input character with one of the supplied matchers."
   `(let ((c (peek-stream (state-stream s))))
-    (and
+    (and c
      (or ,@(loop for m in matchers
                  collect (etypecase m
                            (standard-char `(char= ,m c))

I find that with that patch, the two preceding examples both return NIL as expected. Furthermore this document:

<a>1</a>
<b>2</b>
<c>3</c>
<c>4</c>
<c>5</c>
<c>6</c>

(let's call it /tmp/foo.xml) gives the following output:

* (with-open-file (s "/tmp/foo.xml") (let (x) (loop while (setq x (xmls:parse s)) do (print x))))

#S(XMLS:NODE :NAME "a" :NS NIL :ATTRS NIL :CHILDREN ("1")) 
#S(XMLS:NODE :NAME "b" :NS NIL :ATTRS NIL :CHILDREN ("2")) 
#S(XMLS:NODE :NAME "c" :NS NIL :ATTRS NIL :CHILDREN ("3")) 
#S(XMLS:NODE :NAME "c" :NS NIL :ATTRS NIL :CHILDREN ("4")) 
#S(XMLS:NODE :NAME "c" :NS NIL :ATTRS NIL :CHILDREN ("5")) 
#S(XMLS:NODE :NAME "c" :NS NIL :ATTRS NIL :CHILDREN ("6")) 
NIL

again as expected.

Finally I find that perl run-tests.pl --quicklisp reports

 Did 11 checks.
    Pass: 11 (100%)
    Skip: 0 ( 0%)
    Fail: 0 ( 0%)

What do you think of the patch? Thanks for your consideration.

rpgoldman commented 1 year ago

The biggest question about this patch in my mind is what should happen when you ask XMLS to parse an empty document. Should it be an error or should it simply return nil?

If the answer is the former, then we should modify your patch so that the programmer must ask for the non-standard behavior. If it's the latter, then we can accept your patch as is.

I'm inclined to suggest we make the programmer request non-standard behavior explicitly so that we don't give odd behavior to people who are trying to parse only standard XML documents.

(Honestly, they should probably be using a better XML parser than this one!)

robert-dodier commented 1 year ago

I understand the reticence to implement nonstandard behavior, and I usually try to conform to standards myself, if there is one. However, I'm going to argue for a variance this one time.

I know these are not really convincing arguments, but I hope, anyway, it's enough to tilt the table.

rpgoldman commented 1 year ago

Your arguments are good ones: XMLS is quite rudimentary and does not claim to do much, if any, checking.

OK, I will see about merging the patch. Will you please try to attach a standard git patch (git format-patch) instead of diffing two separate files?