ropensci / openalexR

Getting bibliographic records from OpenAlex
89 stars 19 forks source link

Number of edges from `oa_snowball` doesn't match `cited_by_count` #178

Open TimothyElder opened 9 months ago

TimothyElder commented 9 months ago

When returning all the works that are cited by and that cite a focal article the number of edges in returned edges data frame that go to the focal article should match the cited_by_count of the focal article, but it seems that they usually do not.

I am trying to figure out whether this is an artifact in the data or whether I have misunderstood precisely what oa_snowball returns.

Here is an example of where I think the edges should match but they don't:


focal_article <- oa_fetch(
  entity = "works",
  doi = c("10.1056/nejmoa1000678"),
  verbose = TRUE

snowball_docs <- oa_snowball(
  identifier = focal_article$id,
  verbose = TRUE

edges <- snowball_docs$edges

id <- stringr::str_replace(focal_article$id, "", "")

# drop all works the focal work cites
edges <- edges |>
  filter(to == id)

# Raise error if edges don't match focal_article citation count
  if(nrow(edges) != focal_article$cited_by_count) {
      stop("Number of edges doesn't match cited by count of focal article!")
}, error = function(e) {
  cat("An error occurred: ", e$message, "\n")
yjunechoe commented 9 months ago

Thanks for the report! Definitely not ideal, but it's likely the same situation also reported in

For what it's worth, in my experience with snowball searching it's pretty common to have mismatches between the cited-by number in a paper's records vs. its discoverable connections (even within the same database). You can just think of the number of articles returned by backward-searching in oa_snowball() as the absolute lower bound estimate of cited-by (which doesn't account for older papers, retracted papers, inaccessible papers, etc.).

TimothyElder commented 9 months ago

The discrepancy doesn't seem to be very sizable fortunately, less than 10 or so per article in my estimation. Thanks!