ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
91 stars 20 forks source link

openAlex new author and work data accuracy. #146

Open yhan818 opened 1 year ago

yhan818 commented 1 year ago

Hi, All,

The openAlex new author data was out on July 25, and was finally announced on Aug 11, 2023. I did some tests after July 25 and also after Aug 11. I noticed that there are some intermedia changes in the Author data.

It seems to me that it uses cosine_similarity to measure the similarity and XGBoost of matching an author with his/her name. see code https://github.com/ourresearch/openalex-name-disambiguation/tree/main/V3/002_Data_Processing_Modeling_Clustering

In general, the new author data is much better in terms of accuracy, compared to the previous version. However, there are still issues for both author and work. I have tested some cases. see https://github.com/yhan818/openalexR-test/issues/7

So what are your views on the latest updates?