In 2022, ROR API usage grew to ~12mil requests per month. Approximately ⅔ of these requests are to the affiliation matching service (?affiliation). The design of this service does not allow it to provide the best results for the use case of matching full affiliation strings, because it’s based only on data within ROR records and not on data about what constitutes a “correct” match to a given string or which institutions are mostly likely to be matches.
Users have noted its shortcomings, and some tweaks were made to improve the precision of this service in Fall 2022, but it still falls short in some cases. In the long term, a complete redesign to shift toward a machine-learning based approach would provide better functionality, which could become better over time as more data is added to the model.
A pilot project will be needed to investigate options for re-architecting the affiliation matching service and to determine the costs of running such a service (which could be high). Additional resourcing beyond ROR’s core funding may be needed to develop a production-ready service.
In 2022, ROR API usage grew to ~12mil requests per month. Approximately ⅔ of these requests are to the affiliation matching service (?affiliation). The design of this service does not allow it to provide the best results for the use case of matching full affiliation strings, because it’s based only on data within ROR records and not on data about what constitutes a “correct” match to a given string or which institutions are mostly likely to be matches.
Users have noted its shortcomings, and some tweaks were made to improve the precision of this service in Fall 2022, but it still falls short in some cases. In the long term, a complete redesign to shift toward a machine-learning based approach would provide better functionality, which could become better over time as more data is added to the model.
A pilot project will be needed to investigate options for re-architecting the affiliation matching service and to determine the costs of running such a service (which could be high). Additional resourcing beyond ROR’s core funding may be needed to develop a production-ready service.