hbz / mabxml-elasticsearch

Raw hbz union catalog data exposed via a web API
http://lobid.org/hbz01
3 stars 1 forks source link

Error while parsing XML quits app #1

Closed dr0i closed 9 years ago

dr0i commented 10 years ago

Parsing HT018329706 results in SAXParseException; lineNumber: 1; columnNumber: 2895; XML document structures must start and end within the same entity. and quits the app. The character in question is unicode decimal 34242. Best solution would be to make parsing more robust to even parse that document and, especially, don't quit the app.

dr0i commented 10 years ago

If a parsing error occurs the values wont't be passed on. So only the erorred doc is ignored, not the rest of the documents. See https://github.com/hbz/metafacture-core/commit/88f523514ff09353e9183a53d62c968f00b1186c#diff-a9bb73c7cdcb317760d3b9188642cd57R63 [EDIT:] actually, see https://github.com/hbz/mabxml-elasticsearch/commit/c75d5e50e341053c9e25746e385cd53aa5e3a631#commitcomment-8915633

fsteeg commented 10 years ago

@dr0i We need a pull request against culturegraph/metafacture-core for the change in hbz/metafacture-core. We should not add stuff to our fork without tracking these changes as pull requests.

dr0i commented 9 years ago

As the guidelines for merging into culturegraph/metafacture-core are not finished yet it might be better to wait with this and reconciliate what could be merged into core and what should be merged into a "plugin-repository".

fsteeg commented 9 years ago

I only chimed in because you linked to a metafacture-core commit, but the code causing this issue and the fix are not actually in metafacture-core but in this repo (mabxml-elasticsearch), so my comment is moot. Since I didn't encounter the issue, and didn't review any fix, I'm unassigning instead of closing.

dr0i commented 9 years ago

Ah, right. I think it was at some point intended to go hbz/metafacture-core. So commands will stay in this repo allright.