r-lib / xml2

Bindings to libxml2
https://xml2.r-lib.org/
Other
218 stars 83 forks source link

read_xml_num: Lacks Example and Detailed Explanation #415

Open discoleo opened 10 months ago

discoleo commented 10 months ago

Function read_xml_num

Documentation

The function lacks an example and a detailed explanation. Therefore, I am unsure what it actually does or how to actually use it.

All examples generate an error:

Error: result of type: ‘list’, not numeric

Other Issues

This issue is also related to issue: https://github.com/r-lib/xml2/issues/356

Example

I try to extract the year from the corresponding node. Some records may miss the value for the year, but it makes sense to process the set sequentially and retrieve an NA. Skipping such nodes would generate frame-shifts and scramble the data (as I am actually interested in the whole record, in this example it includes the Title & Year). The straightforward solution: xml_find_num, but it doesn't look to do this.

library("xml2")

### All Years Present

sx = "<?xml version=\"1.0\" ?>
<ArticleSet>
<a><b>Title 1</b>
    <c>2023</c>
</a>
<a><b>Title 2</b>
    <c>2022</c>
</a>
<a><b>Title 3</b>
    <c>2023</c>
</a>
</ArticleSet>"

x = read_xml(sx)

ns = xml_find_all(x, "/ArticleSet/a")
ns

xml_find_all(ns, ".//c/text()")
xml_find_num(ns, ".//c/text()")
xml_find_num(ns, ".//c")

####################
### One Year Missing

sx = "<?xml version=\"1.0\" ?>
<ArticleSet>
<a><b>Title 1</b>
    <c>2023</c>
</a>
<a><b>Title 2</b>
    <c></c>
</a>
<a><b>Title 3</b>
    <c>2023</c>
</a>
</ArticleSet>"

x = read_xml(sx)

ns = xml_find_all(x, "/ArticleSet/a")
ns

# one Node is missing the Year
# !! find_all actually skips this node !!
xml_find_all(ns, ".//c/text()")
# nodeset with only 2023 & 2023;
xml_text(xml_find_all(ns, ".//c"))
# "2023" ""     "2023"
xml_find_num(ns, ".//c/text()")