Closed matthijskooijman closed 1 month ago
not add a call to next(), but just call nextTag() as now, but catch the exception. Unfortunately nextTag() can also cause other exceptions that are undistinguishable except for the message, but we can check the current type in the except handler, and if it is END_OF_DOCUMENT (or hasNext() returns false), ignore the exception and return.
This seems to work, see #252 for a PR implementing it.
When I create a new procedure through the CSAPI and use SensorML XML as the body, I get:
To reproduce:
The procedure is actually created, but it seems that the code tries to read another procedure XML from the body and then gets confused about whether more data is available. I have not found any way to circumvent this issue from the client.
With #247 applied, I obtained the following backtrace:
This points me to this code:
https://github.com/opensensorhub/osh-core/blob/d111a6972126dc238af7ac2c0e26adf88db8b1be/sensorhub-service-consys/src/main/java/org/sensorhub/impl/service/consys/resource/BaseResourceHandler.java#L321
Which tries to deserialize objects until there are no more.
This calls the following code:
https://github.com/opensensorhub/osh-core/blob/d111a6972126dc238af7ac2c0e26adf88db8b1be/sensorhub-service-consys/src/main/java/org/sensorhub/impl/service/consys/sensorml/SmlProcessBindingSmlXml.java#L70-L77
Which seems correct - it checks if there is more to deserialize, and if so reads the next tag.
Debug results
However, in practice
hasNext()
returns true, but thennextTag()
concludes that there is nothing left to parse, and raises.hasNext()
looks like this:Debug printing (using
getEventType()
) shows thatmCurrToken == 2
(END_ELEMENT
according to this source where I think this value comes from), sohasNext()
returns true.However, then when
next()
is called (as done bynextTag()
),mCurrToken
changes to 8 (END_OF_DOCUMENT
), which causesnextTag()
to raise.Multi-document mode
The woodstox code has some reference to a "multidoc" mode, but from what I've seen I do not think this is actually enabled (but I can't recall details - I looked at this last week).
Since the OSH code is actually trying to deserialize multiple documents, I wondered about this. I also tried to actually POSTing multiple documents (to see if this would work, and to ensure that any fix I would apply would not break this), but I was unsure how this would be formatted (a few naive attempts failed). I looked around the unit tests, but did not immediately saw a test for this either.
Upgrading woodstox
I already tried upgrading woodstox to the latest 6.6.0 version (by modifying
osh-node-dev-template/include/osh-core/lib-ogc/swe-common-core/build.gradle
), but then the problem persists.How to fix
I am not quite familiar with how woodstox is supposed to work in this regard, so I'm not quite sure if this is a bug in woodstox or that OSH is using the woodstox API incorrectly. I tried figuring out, but the XML parsing code is so complex that I did not really know.
I've also checked the woodstox API docs, but for
hasNext()
andnextTag()
that just refers to the java builtinXmlStreamReader interface
.IIUC, the reader has a concept of a current event (type returned by
getEventType()
), andnext()
reads the next event and makes it current. After all events there is one finalEND_OF_DOCUMENT
event. Once that event is the current one,hasNext()
returns false andnext()
should not be called again.In this case, once the first deserialize is completed, the current event is still the
END_ELEMENT
of the previous top-level document. This means thathasNext()
will always return true.Ideally, we would have a
peek()
method, but that does not exist (there is apeekNext()
method, but it is from theStreamScanner
superclass and seems to return individual characters, not parsed events.To ensure that
hasNext()
returns the proper result , you could just callnext()
before callinghasNext()
, which would work if there is no more data. However, if there is more data, this means that the current event is now (probably) aSTART_ELEMENT
, ready to be processed. But sincenextTag
starts with callingnext()
, thisSTART_ELEMENT
is then skipped. So we couldnextTag()
at all, but then this would break if there is whitespace or comments between the documents (which is whatnextTag()
skips andsmlBindings.readDescribedObject()
will choke on that).nextTag()
only when the current type is somethingnextTag()
would have skipped as well, but that duplicates a bunch of logic fromnextTag()
. I noticed there is anisWhitespace()
method, which prevents having to duplicate the fairly complex whitespace detection, but you would still end up duplicating the list of things to skip (whitespace, comments).nextTag()
only if the current type is notSTART_ELEMENT
orEND_ELEMENT
(which is whatnextTag
skips towards), but then you might end up unintentionally skipping something that is not whitespace or comments.next()
, but just callnextTag()
as now, but catch the exception. UnfortunatelynextTag()
can also cause other exceptions that are undistinguishable except for the message, but we can check the current type in the except handler, and if it isEND_OF_DOCUMENT
(orhasNext()
returns false), ignore the exception and return.After writing this, things become a little more clear. I have the idea that option 4. would be the most robust working solution. I'll see if I can get this to work and maybe create a PR for this.
Other affected code
From looking through the code, it seems the only other XML parsing code that works like this (found using
git grep -A 5 hasNext | grep -B 5 nextTag
) is the SmlFeatureBindingSmlXml class which contains the same construct, which I expect is broken in the same way.There is also
SmlFeatureBindingSmlJson
which has very similar code, but that actually usespeek()
(which the JSON parser does provide), so I expect this problem does not exist there (except maybe when there is trailing whitespace, which might not triggerhasNext()
, but might causenextTag()
to raise - I have not investigated).