ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
93 stars 21 forks source link

Differences in citations "cited_by_count" and "oa_fetch" #115

Open JeffreySmithA opened 1 year ago

JeffreySmithA commented 1 year ago

Hi,

I'm struggling to reconcile two facts.

On the one hand, when I get a specific author, say using the following code:

test_oa <- oa_fetch( author.id = "A2054529157", entity = "works", verbose = TRUE )

I then look at the first row/paper in this dataframe, and open up the counts_by_year, I get the following (which has paperid: W2004018659):

year / cited_by_count 2023/5 2022/6 2021/10 2020/15 2019/11 2018/11 2017/5 2016/12

But, now when I try and calculate the citations to that same paper for all of the years, I use this code:

dat2 <- oa_fetch("works", cites = "W2004018659") |> dplyr::count(publication_year) dat2

But now the citations are not matching up, so I get:

year/citations 2023/3 2022/7 2021/13 2020/13 2019/11 2018/11 2017/5 2016/13

Would anyone be able to explain why these numbers are differing, for the same work?

Thanks again in advance!

trangdata commented 1 year ago

Hi @JeffreySmithA good question! The OpenAlex docs explained that cited_by_count only goes back 10 years, so that explains some missing years, but I'm not sure why we see 12 here (perhaps they recently increased to 12)?.

List: Works.cited_by_count for each of the last ten years, binned by year. To put it another way: each year, you can see how many times this work was cited. Any citations older than ten years old aren't included. Years with zero citations have been removed so you will need to add those in if you need them.

I'm not sure how exactly is cited_by_count calculated, but I imagine some differences in when the counts are made (maybe not publication_year) result in some other discrepancies?

If resolving this issue is important to you, I recommend reaching out to the OpenAlex team.

JeffreySmithA commented 1 year ago

Thank you! I've reached out to them and will report back here when I get a response. If I don't get a response in the near future, I will close the thread.

amacanovic commented 7 months ago

@JeffreySmithA did you ever receive a response from the OA team? Could you please share the response?

I am seeing discrepancies even in the ordinary oa_fetch, where "cited_by_total" counts per year, added together, outnumber the "cited_by_count" count within the author entity. I cannot deduct, from the docs, why this would be the case.

E.g. here:

test <- oa_fetch(entity = "authors", openalex_id = "https://openalex.org/A5002522655")

# cited_by_count returns 19k citations
test$cited_by_count
[1] 19254

# and if we add up together year-by-year citations since 2012 **only**, we get more!
sum(test[[8]][[1]]$cited_by_count)
[1] 29367