r-lib / xml2

Bindings to libxml2
https://xml2.r-lib.org/
Other
220 stars 81 forks source link

Weird behavior of as_list() output #354

Open Dekermanjian opened 3 years ago

Dekermanjian commented 3 years ago

When converting an xml that is read into R using read_xml() into an R list using 'as_list()' the list structure details are correct but the each list item is actually the same and does not follow what the details suggest. I am going to insert a picture to help explain:

Screen Shot 2021-11-12 at 8 14 44 AM

The displayed "list of length " under the value column are correct, however when you expand as shown in the image the lengths are not actually what R says it should be. You can see this in the second and fifth "sec"s that I expanded in the attached screenshot. Moreover, all the list items are exactly the same as the first list item, this is not the case in the actual XML nor is it the case before using the as_list() function.

Any help would be greatly appreciated thank you.

Dekermanjian commented 3 years ago

sorry I forgot to include the sessioninfo() here it is:

Screen Shot 2021-11-12 at 8 22 40 AM
hadley commented 2 years ago

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

ASLehnert-p commented 2 years ago

I actually have the same problem. Minimal examplein the zip attached returns the following list to me: test method Analyte CasNumber [[1]] "110-86-1" TrivialName [[1]] "pyridine" Analyte CasNumber [[1]] "110-86-1" TrivialName [[1]] "pyridine" Analyte CasNumber [[1]] "110-86-1" TrivialName [[1]] "pyridine" min example.zip

hadley commented 2 years ago

@ASLehnert-p that's not a minimal example as I have no way to run your code. Please see some advice at https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html

ASLehnert-p commented 2 years ago

Sorry, my first time trying to post xml files. It'S in the zip attached.

ASLehnert-p commented 2 years ago

Sorry for the multiple edits; here it is as a minimal example:

`testFile <- read_xml('\<?xml version="1.0" encoding="UTF-8"?> \<method> \<Analyte> \<CasNumber>110-86-1\</CasNumber> \<TrivialName>pyridine\</TrivialName> \</Analyte> \<Analyte> \<CasNumber>67-56-1\</CasNumber> \<TrivialName>methanol\</TrivialName> \</Analyte> \<Analyte> \<CasNumber>68-12-2\</CasNumber> \<TrivialName>N,N-dimethylformamide\</TrivialName> \</Analyte> \</method>')

test <- as_list(testFile)`

Interestingly, the function itself (as_list(testFile)) seems to return the desired result, but when I try saving it to "test", it repeats the first "Analyte" instance.

hadley commented 2 years ago

Here is an actual reprex.

library(xml2)
testFile <- read_xml('<?xml version="1.0" encoding="UTF-8"?>
<method>
<Analyte>
<CasNumber>110-86-1</CasNumber>
<TrivialName>pyridine</TrivialName>
</Analyte>
<Analyte>
<CasNumber>67-56-1</CasNumber>
<TrivialName>methanol</TrivialName>
</Analyte>
<Analyte>
<CasNumber>68-12-2</CasNumber>
<TrivialName>N,N-dimethylformamide</TrivialName>
</Analyte>
</method>')

str(as_list(testFile))
#> List of 1
#>  $ method:List of 3
#>   ..$ Analyte:List of 2
#>   .. ..$ CasNumber  :List of 1
#>   .. .. ..$ : chr "110-86-1"
#>   .. ..$ TrivialName:List of 1
#>   .. .. ..$ : chr "pyridine"
#>   ..$ Analyte:List of 2
#>   .. ..$ CasNumber  :List of 1
#>   .. .. ..$ : chr "67-56-1"
#>   .. ..$ TrivialName:List of 1
#>   .. .. ..$ : chr "methanol"
#>   ..$ Analyte:List of 2
#>   .. ..$ CasNumber  :List of 1
#>   .. .. ..$ : chr "68-12-2"
#>   .. ..$ TrivialName:List of 1
#>   .. .. ..$ : chr "N,N-dimethylformamide"

Created on 2022-10-24 with reprex v2.0.2

What surprises you about this output?

ASLehnert-p commented 2 years ago
library(xml2)
testFile <- read_xml('<?xml version="1.0" encoding="UTF-8"?>
<method>
<Analyte>
<CasNumber>110-86-1</CasNumber>
<TrivialName>pyridine</TrivialName>
</Analyte>
<Analyte>
<CasNumber>67-56-1</CasNumber>
<TrivialName>methanol</TrivialName>
</Analyte>
<Analyte>
<CasNumber>68-12-2</CasNumber>
<TrivialName>N,N-dimethylformamide</TrivialName>
</Analyte>
</method>')

test <- as_list(testFile)

View(test)

That if I do this, I get a repetition of the first analyte node for the latter two analyte nodes: grafik

alearrigo commented 1 year ago

I'm having the same issue. The bug is not in the read_xml or as_list functions I think but is something afflicting the view. As @hadley shows, the structure of the list is fine, and If you extract the elements of the list one by one like this:

testFile_list <- testFile %>% 
  as_list()

you'll see:

testFile_list[[1]][[1]][1]

giving


$CasNumber
$CasNumber[[1]]
[1] "110-86-1"

and

testFile_list[[1]][[3]][1]

giving

$CasNumber
$CasNumber[[1]]
[1] "68-12-2" 

but if you try to see structure in the view panels, it seems the all the elements of the list are the same!

What could cause this behavior?

Screenshot 2023-06-19 alle 16 37 40