plazi / O3RT

Repository for the Open Refine-Refindit Reconciliation Tool
1 stars 0 forks source link

stats: estimate the accuracy of batch accepting best match #1

Open mguidoti opened 5 years ago

mguidoti commented 5 years ago

@tcatapano suggested (2019-09-17,, Paris) the idea of a accuracy test based on manually checking a given number of entries (10-20% of Taxodros dataset), randomly selected, multiple times (5-10x).

This should be part of the O3RT paper, and cited in the Taxodros paper. The list of the subsets should also be available somewhere for publication purposes.

mguidoti commented 4 years ago

Testing Protocol

Datasets

Dataset # of Papers
Poa@Plazi Members' Publications Dataset 25 papers, 20 with known DOIs
Grazia Dataset 51 publications, 35 with known DOIs
Covid-19 Task Force Database 25 publication, 25 with known DOIs

Test 001

Refindit not hitting Datacite API for some reason Done in August 20th, 2020.

Matching Success Rate

Dataset Raw %
Poa@Plazi 17(+1)/20(+1) 85%
Grazia 32(+2 from Zenodo) 91.42%
Covid-19 20/25 80%

Accuracy

Dataset Raw % Details
Poa@Plazi 18/18 100% 17 Matched & Equal DOIs, 1 new DOI found!, 3 Not Matched
Grazia 32/32 100% 32 Matched & Equal DOIs, 1 Not Matched, 2 Not Matched (Zenodo)
Covid-19 19/20 95%% 16 Matched & Equal DOIs, 3 Matched, different DOIs due to DOI duplication, 1 Matched, publons DOI for some reason, 5 Not matched

Summary

Dataset # of Papers Matching Rate Accuracy Rate
Plazi@Poa 25 85% 100%
Grazia 51 91.42% 100%
Covid-19 25 80% 95%