Closed tc closed 12 years ago
Hello!
You have "\n" instead of "\n" in the document. Thus, your document couldn't be parsed even though I used libxml version of Nokogiri. So, first, I substituted all "\n" to "\n", then tried to parse the document. Then, I didn't get NullPointerException at all. However, the xml document have several errors except "\n". The xml document seems to be an invalid. Because of this, Java version could not parse whole document, and got very few texts. While, libxml version is not strict to validity, so I got bunch of texts from the line "doc.text" .
The difference of parser behavior is very hard to overcome. Unless the document is parsed successfully, Nokogiri's methods can't do anything. So, would you review the document?
The document came from freebase WEX format. It's an attempt at transforming mediawiki into xml but i guess it doesn't produce semantically correct xml.
I get a null pointer error when parsing the string below:
Works in ruby 1.9.2