I try to extract the year from the corresponding node. Some records may miss the value for the year, but it makes sense to process the set sequentially and retrieve an NA. Skipping such nodes would generate frame-shifts and scramble the data (as I am actually interested in the whole record, in this example it includes the Title & Year). The straightforward solution: xml_find_num, but it doesn't look to do this.
library("xml2")
### All Years Present
sx = "<?xml version=\"1.0\" ?>
<ArticleSet>
<a><b>Title 1</b>
<c>2023</c>
</a>
<a><b>Title 2</b>
<c>2022</c>
</a>
<a><b>Title 3</b>
<c>2023</c>
</a>
</ArticleSet>"
x = read_xml(sx)
ns = xml_find_all(x, "/ArticleSet/a")
ns
xml_find_all(ns, ".//c/text()")
xml_find_num(ns, ".//c/text()")
xml_find_num(ns, ".//c")
####################
### One Year Missing
sx = "<?xml version=\"1.0\" ?>
<ArticleSet>
<a><b>Title 1</b>
<c>2023</c>
</a>
<a><b>Title 2</b>
<c></c>
</a>
<a><b>Title 3</b>
<c>2023</c>
</a>
</ArticleSet>"
x = read_xml(sx)
ns = xml_find_all(x, "/ArticleSet/a")
ns
# one Node is missing the Year
# !! find_all actually skips this node !!
xml_find_all(ns, ".//c/text()")
# nodeset with only 2023 & 2023;
xml_text(xml_find_all(ns, ".//c"))
# "2023" "" "2023"
xml_find_num(ns, ".//c/text()")
Function read_xml_num
Documentation
The function lacks an example and a detailed explanation. Therefore, I am unsure what it actually does or how to actually use it.
All examples generate an error:
Other Issues
This issue is also related to issue: https://github.com/r-lib/xml2/issues/356
Example
I try to extract the year from the corresponding node. Some records may miss the value for the year, but it makes sense to process the set sequentially and retrieve an NA. Skipping such nodes would generate frame-shifts and scramble the data (as I am actually interested in the whole record, in this example it includes the Title & Year). The straightforward solution: xml_find_num, but it doesn't look to do this.