greenelab / library-access

Collecting data on whether library access to scholarly literature
Other
5 stars 3 forks source link

Analyze PennText accuracy on 200 DOIs, Select 500 for additional calls #19

Closed dhimmel closed 6 years ago

dhimmel commented 6 years ago

Refs https://github.com/greenelab/library-access/issues/15.

Analyzes accuracy on 200 DOIs (100 where PennText was true, 100 where PennText was false).

Select 500 DOIs for an expanded manual assessment. Stratified on PennText to match the proportion in the entire DOI set. Reuses as many DOIs with calls as possible.

Todo:

dhimmel commented 6 years ago

Note the overall inaccuracy of PennText calls is only 12.4%. This is because accuracy when PennText was true is 94%, and most DOIs are PennText == true.

dhimmel commented 6 years ago

The idea here is that curation would continue in manual-doi-checks-500.tsv. Currently, this file doesn't have the date queried columns and has different names than before. Let me know if that's a problem. You could always edit the column names / add new ones manually if you wanted.

dhimmel commented 6 years ago

Pinging @publicus

jglev commented 6 years ago

I've reviewed your sample, and it looks good to me. I've updated the facilitation script to use manual-doi-checks-500.tsv, as well, and have gotten started. The facilitation script does add the date columns back automatically; I do prefer keeping them if the data are going to be public, since they can help if there's a question later about journal subscription timelines. My understanding from your comment above is that you're fine with those date columns being retained; is that correct?

dhimmel commented 6 years ago

My understanding from your comment above is that you're fine with those date columns being retained; is that correct?

Yep.

I've updated the facilitation script to use manual-doi-checks-500.tsv

So this PR is ready to merge? If everything looks good to you, "approve" it under "files changed" > "review changes".

jglev commented 6 years ago

A quick logistics question: Is your idea that the edits to the facilitation script, and the results from the 500, be in their own PR? If so, yes, this is ready, and I'll mark it as approved.

dhimmel commented 6 years ago

A quick logistics question: Is your idea that the edits to the facilitation script, and the results from the 500, be in their own PR? If so, yes, this is ready, and I'll mark it as approved.

Yes