njahn82 / semscholar

R interface to Semantic Scholar API
Other
16 stars 4 forks source link

s2_papers fails when papers have inconsistent type for year #5

Open rdmorin opened 1 year ago

rdmorin commented 1 year ago
incompatible_papers = s2_papers(c("a661420975b9201e5f67b4c24527376f0a646a87", "3a5b354d117eeb86b62c8fc91296ce55464e9e71"))
Error in `dplyr::bind_rows()`:
! Can't combine `..1$year` <character> and `..2$year` <integer>.
Run `rlang::last_trace()` to see where the error occurred.

Looks like the year class for the second paper is 'character' but it's numeric for most papers. Is this a known issue? Maybe it only affects certain (newer) versions of dplyr?

packageVersion("dplyr")
[1] ‘1.1.2’
rdmorin commented 1 year ago

Looks like an easy fix. I modified one line in this function to force the year to be cast as numeric.

parse_s2_paper_md <- function(out) {
  out <- purrr::map_if(out, is.null, ~ NA_character_)
  tibble::tibble(
    abstract = out[["abstract"]],
    arxiv_id = out[["arxivId"]],
    authors = list(parse_s2_authors(out[["authors"]])),
    citation_velocity = out[["citationVelocity"]],
    citations = list(parse_s2_citations(out, "citations")),
    doi = out[["doi"]],
    influential_citation_count = out[["influentialCitationCount"]],
    paper_id = out[["paperId"]],
    references = list(parse_s2_citations(out, "references")),
    title = out[["title"]],
    topics = list(parse_s2_topics(out[["topics"]])),
    url = out[["url"]],
    venue = out[["venue"]],
    year = as.numeric(out[["year"]])
  )
}