Closed fconstancias closed 3 years ago
Hi @fconstancias ,
I think I have worked something out for this, let me know if it works as expected.
library(XML)
library(rentrez)
library(tidyverse)
summarise_biosample <- function(biosample_id){
parsed <- entrez_fetch(db="biosample", id=biosample_id, rettype="xml", parsed=TRUE)
attr_values <- xpathSApply(parsed, "//Attributes/Attribute", xmlValue)
attr_names <- xpathApply(parsed, "//Attributes/Attribute", xmlAttrs)
sample_df <- data.frame(attribute_type = unlist(lapply(attr_names, names)),
attribute = unlist(attr_names),
value = rep(attr_values, lengths(attr_names)))
sample_df$biosample <- biosample_id
sample_df
}
biosamples <- c("SAMN12414413","SAMN08472433")
res <- bind_rows(lapply(biosamples, summarise_biosample))
head(res)
attribute_type attribute value biosample
1 attribute_name strain 638R SAMN12414413
2 harmonized_name strain 638R SAMN12414413
3 display_name strain 638R SAMN12414413
4 attribute_name host Homo sapiens SAMN12414413
5 harmonized_name host Homo sapiens SAMN12414413
6 display_name host Homo sapiens SAMN12414413
Hi All,
I am struggling to extract Attributes from several Biosamples in a nice tibble.
But since I did not manage to extract
display_name
orattribute_name
from the xml I can't really make it work.df_all
Any help will be very much appreciated.
Thanks a ton