Closed dhimmel closed 6 years ago
Note the overall inaccuracy of PennText calls is only 12.4%. This is because accuracy when PennText was true is 94%, and most DOIs are PennText == true.
The idea here is that curation would continue in manual-doi-checks-500.tsv
. Currently, this file doesn't have the date queried columns and has different names than before. Let me know if that's a problem. You could always edit the column names / add new ones manually if you wanted.
Pinging @publicus
I've reviewed your sample, and it looks good to me. I've updated the facilitation script to use manual-doi-checks-500.tsv
, as well, and have gotten started. The facilitation script does add the date columns back automatically; I do prefer keeping them if the data are going to be public, since they can help if there's a question later about journal subscription timelines. My understanding from your comment above is that you're fine with those date columns being retained; is that correct?
My understanding from your comment above is that you're fine with those date columns being retained; is that correct?
Yep.
I've updated the facilitation script to use
manual-doi-checks-500.tsv
So this PR is ready to merge? If everything looks good to you, "approve" it under "files changed" > "review changes".
A quick logistics question: Is your idea that the edits to the facilitation script, and the results from the 500, be in their own PR? If so, yes, this is ready, and I'll mark it as approved.
A quick logistics question: Is your idea that the edits to the facilitation script, and the results from the 500, be in their own PR? If so, yes, this is ready, and I'll mark it as approved.
Yes
Refs https://github.com/greenelab/library-access/issues/15.
Analyzes accuracy on 200 DOIs (100 where PennText was true, 100 where PennText was false).
Select 500 DOIs for an expanded manual assessment. Stratified on PennText to match the proportion in the entire DOI set. Reuses as many DOIs with calls as possible.
Todo: