muschellij2 / rscopus

Scopus Database API Interface to R
74 stars 16 forks source link

is there any existing function to retrieve the tables that are embedded in the scopus provided open source articles? #40

Closed vipulg13 closed 3 years ago

muschellij2 commented 3 years ago

Which XML tables are you referring to?

If you have a question, please provide a MCVE: https://stackoverflow.com/help/mcve. In any example, I recommend using a reproducible example using the reprex package (https://github.com/tidyverse/reprex). Also, please include a sessioninfo::session_info() output.

vipulg13 commented 3 years ago

I am sorry. I should have posted this issue to the fulltext team, as they provide the download of science direct articles in xml format. Anyway, I am using both "fulltext" and "rscopus" packages for my research project. Actually, I am searching for ways to get the data objects for the tables that are embedded inside the scopus provided open access articles. As per my knowledge, the existing function "download_object()" only provides support in extracting image objects, and other gif objects, but not to the table objects.

muschellij2 commented 3 years ago

Sorry - without a reproducible example, I can't really help.

vipulg13 commented 3 years ago

Here is a reproducible example:

doi <- "10.1016/j.cja.2020.04.031" art <- article_retrieval(id = "10.1016/j.cja.2020.04.031", identifier = "doi", view = "FULL", verbose = F) lstArt <- art$content$full-text-retrieval-response

please note that this lstArt includes a list of objects inside the list element "objects", which further contains a path to download the embedded images. The similar functionality is missing in case of embedded tables. For example, in the same research article, there are two tables embedded. The data of these table are transformed into raw text along with the text data, available under lstArt$originalText. The retrieval and restructuring of the table data from the raw text appear to be very complicated. Is there any way to retrieve these tables at the API level and provide them as an accessible R object?

Let me know if you require further information from my side.

muschellij2 commented 3 years ago

Does it indicate that these are available at https://dev.elsevier.com/documentation/ArticleRetrievalAPI.wadl?

muschellij2 commented 3 years ago

I don't think it embeds tables in there:

obj <- object_retrieval(id = "10.1016/j.cja.2020.04.031",
identifier = "doi")

df = jsonlite::fromJSON(
httr::content(obj$get_statement, as= "text"),
flatten = TRUE)
vipulg13 commented 3 years ago

that's true. These are media objects containing figures, images, formulas, etc. I will raise my concern to the developers of Elsevier. If they add this new feature to their API, then I will get in touch with you to get this integrated in rscopus package. Thanks for your support so far!