Closed thomasopsomer closed 7 years ago
@thomasopsomer could you please post the ./mercedes.xml
file?
This file: https://github.com/idio/json-wikipedia/blob/development/src/test/resources/en/mercedes.xml
Looking again I see that the tests are using the mercedes.txt. So do I need to give the text
field of the xml to the parser ?
hi @thomasopsomer , you answered your own question. The parse of a single article means parsing the value of the text field of the page
node in the xml. So if you pass it to the parse
method of the parser, you should get decent results. Let us know if you don't.
+1
Cool it's working now ! At first sight I didn't understand that all information was in the text field. I've been using json-wikipedia for while but always as a black box ! Anyway thanks for the help, and thanks for implementing the "links with span offsets" feature :)
Closing this issue then.
Hi,
I'm trying to parse single wikipedia xml file. Like the mercedes.xml in the test of this repo. Following the code in the test section I tried something like:
But the result is strange. Many properties are blank, like title, wikiTitle, ... and paragraphs / clean text are also wrongly parsed. I guess I'm doing something wrong ^^ If you could show some usage of the API to process a single article in xml it would be very great :)
Thanks, Thomas