plazi / O3RT

Repository for the Open Refine-Refindit Reconciliation Tool

1 stars 0 forks source link

stats: estimate the accuracy of batch accepting best match #1

Open mguidoti opened 5 years ago

mguidoti commented 5 years ago

@tcatapano suggested (2019-09-17,, Paris) the idea of a accuracy test based on manually checking a given number of entries (10-20% of Taxodros dataset), randomly selected, multiple times (5-10x).

This should be part of the O3RT paper, and cited in the Taxodros paper. The list of the subsets should also be available somewhere for publication purposes.

mguidoti commented 4 years ago

Testing Protocol

Datasets

Dataset	# of Papers
Poa@Plazi Members' Publications Dataset	25 papers, 20 with known DOIs
Grazia Dataset	51 publications, 35 with known DOIs
Covid-19 Task Force Database	25 publication, 25 with known DOIs

Test 001

Refindit not hitting Datacite API for some reason Done in August 20th, 2020.

Matching Success Rate

Dataset	Raw	%
Poa@Plazi	17(+1)/20(+1)	85%
Grazia	32(+2 from Zenodo)	91.42%
Covid-19	20/25	80%

Accuracy

Dataset	Raw	%	Details
Poa@Plazi	18/18	100%	17 Matched & Equal DOIs, 1 new DOI found!, 3 Not Matched
Grazia	32/32	100%	32 Matched & Equal DOIs, 1 Not Matched, 2 Not Matched (Zenodo)
Covid-19	19/20	95%%	16 Matched & Equal DOIs, 3 Matched, different DOIs due to DOI duplication, 1 Matched, publons DOI for some reason, 5 Not matched

Summary

Dataset	# of Papers	Matching Rate	Accuracy Rate
Plazi@Poa	25	85%	100%
Grazia	51	91.42%	100%
Covid-19	25	80%	95%