Closed JeffreySmithA closed 1 year ago
Hi @JeffreySmithA, one way to do this is simply use oa_fetch
to find all works by your group of authors, then summarise the cited_by_count
column.
For example, say I want to find the citation counts over the years of Emmanuelle Charpentier and Jennifer Doudna, I can do:
library(openalexR)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
dat <- oa_fetch("works", author.id = c("A4358797308", "A2163159272")) |>
unnest(author) |>
filter(grepl("A4358797308|A2163159272", au_id)) |>
group_by(au_id, publication_year) |>
summarise(cited_by_count = sum(cited_by_count), .groups = "drop")
dat
#> # A tibble: 59 × 3
#> au_id publication_year cited_by_count
#> <chr> <int> <int>
#> 1 https://openalex.org/A2163159272 1993 92
#> 2 https://openalex.org/A2163159272 1995 145
#> 3 https://openalex.org/A2163159272 1998 111
#> 4 https://openalex.org/A2163159272 1999 420
#> 5 https://openalex.org/A2163159272 2000 364
#> 6 https://openalex.org/A2163159272 2002 2
#> 7 https://openalex.org/A2163159272 2003 101
#> 8 https://openalex.org/A2163159272 2004 438
#> 9 https://openalex.org/A2163159272 2007 12
#> 10 https://openalex.org/A2163159272 2008 296
#> # … with 49 more rows
Created on 2023-05-09 with reprex v2.0.2
This is amazing!! Thank you so much, this saved me a lot of time.
Final question. Do you know how to do this at the paper level? So, I have a specific work with its work id, e.g. W2010555999 (random number). I'm still struggling to manage to adopt your solution to get all of the citations to the paper.
Thanks in advance!
@trangdata's proposed solution provides, for each year of publication, the total citations received by papers published in that year to date. The information stored by OpenAlex in the count_by_year field, on the other hand, counts how many citations an author has received in that year. The two measures are very different. For example: If author X publishes two articles (Y and Z):
work Y, publication year 2020, Count_by_year: 2020: 2 citations 2021: 20 citations 2022: 35 citations The cited_by_count field will be equal to the sum, 57.
work Z, publication year 2021, Count_by_year: 2021: 3 citations 2022: 8 citations The cited_by_count field will equal the sum, 11.
Now if we calculate the total count_by_year of author X we will have that (considering his two works) he will have been cited: 2 times in 2020 23 times in 2021 43 times in 2020
This is the calculation that OpenAlex performs when it calculates the count_by_year field.
If we use Trang's solution instead, we will get that the count_by_year of author X will be equal to: 2020: 57 2021: 11
Unfortunately, a calculation of the count_by_year vector beyond ten years cannot be easily obtained at present. The only possible, but extremely expensive, solution is to perform an author snowball search and count the works citing author X year by year.
Hi @JeffreySmithA please see @massimoaria's answer above for the difference between what OpenAlex reports as counts_by_year
(number of citations an author gets a year) vs. how my earlier code calculated the number of citations an author gets from publications of a certain year.
That said, if you want to find all works that cite a specific paper, say, W2160237763, you can use the cites
filter then count the number of citations by year:
library(openalexR)
dat2 <- oa_fetch("works", cites = "W2160237763") |>
dplyr::count(publication_year)
dat2
#> # A tibble: 18 × 2
#> publication_year n
#> <int> <int>
#> 1 2005 5
#> 2 2006 5
#> 3 2007 11
#> 4 2008 9
#> 5 2009 6
#> 6 2010 13
#> 7 2011 8
#> 8 2012 12
#> 9 2013 15
#> 10 2014 13
#> 11 2015 11
#> 12 2016 5
#> 13 2017 6
#> 14 2018 8
#> 15 2019 4
#> 16 2020 9
#> 17 2021 6
#> 18 2022 5
Created on 2023-05-09 with reprex v2.0.2
Thank you both very much. Incredibly helpful!
I'm trying to count citations by year for authors and publications for more than 10 years? The count by year only goes up to 10 years. Is there any easy way to do this? Currently, I'm thinking the easiest way may be using the snowball option but I wanted to ask here first.