There seems to be a bug with htmlTreeParse() in XML 3.98-1.4 on R 3.3.2. Here's a minimal example:
link = "http://anson.ucdavis.edu/~mueller/cveng13.html"
tree = htmlTreeParse(link)
tree_body = tree$children$html[[2]]
tree_div = getNodeSet(tree_body, path="//div")
The error message is:
Failed to parse QName 'padding-left:'
Failed to parse QName 'padding-bottom:'
Failed to parse QName 'padding-top:'
Comment must not contain '--' (double-hyphen)
Comment must not contain '--' (double-hyphen)
Comment must not contain '--' (double-hyphen)
Error: 1: Failed to parse QName 'padding-left:'
2: Failed to parse QName 'padding-bottom:'
3: Failed to parse QName 'padding-top:'
4: Comment must not contain '--' (double-hyphen)
5: Comment must not contain '--' (double-hyphen)
6: Comment must not contain '--' (double-hyphen)
htmlTreeParse() returns an R representation of the XML document. That cannot be used with getNodeSet(). That requires htmlParse() or htmlTreeParse( , useInternalNodes = TRUE).
There seems to be a bug with
htmlTreeParse()
in XML 3.98-1.4 on R 3.3.2. Here's a minimal example:The error message is:
This error does not occur with
htmlParse()
.