ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
91 stars 20 forks source link

Matching authors based on a set of matching criteria #144

Closed lorenzopalladini closed 1 year ago

lorenzopalladini commented 1 year ago

I have a group of scholars that received a treatment at a certain point in time (they all receive the treatment at different points in time) and I am trying to find a control group of scholars with similar characteristics (number of publications, citations, affiliation, and so on) at the time the treated scholars received the treatment. For example, John Smith is a scholar who received the treatment in 2018 and at that time he had 50 publications, 400 citations, 60 unique coauthors, was affiliated at NYU, published on topics related to medicine, and his first publication was in 2003. I would like to find a group of at least 10 scholars that back in 2018 had the same or similar characteristics (if possible I would also like to impose a maximum margin of difference). Is there a way to do this?

trangdata commented 1 year ago

Hi @lorenzopalladini, I would first find a group of, say, 500 scholars similar to this:

library(openalexR)
scholars <- oa_fetch(
  "authors",
  works_count = ">40",
  works_count = "<60",
  cited_by_count = ">300",
  cited_by_count = "<500",
  options = list(sample = 500, seed = 1)
)

Then, you can find the 10 scholars with the most similar features such as similar institution as NYU, similar concept score (x_concepts), number of co-authors, etc.

I know this is not taking into account the 2018 time point, but you can adjust this perhaps by increasing the works_count and cited_by_count condition in the query a little, and once you get the output dataframe, go to counts_by_year to total the numbers by 2018.

Related: https://github.com/ropensci/openalexR/issues/116#issuecomment-1593421423