Open bnewbold opened 4 years ago
I would like to work on this. Could you provide some more details? What kind of mechanism can be used to fetch the data from their database? They have clearly mentioned that scraping the website is prohibited (https://retractionwatch.com/retraction-watch-database-user-guide/).
Ah, I didn't notice that. The services are on different domains so I didn't realize they were the same project, but now I see the "User Guide" link. I guess the next step would be to find alternative sources of retraction metadata with persistent identifiers (eg, DOI or PubMed identifier). Some sources I can think of are:
publication_stage
does not match or has changed to "retracted")Here is an open corpus of ~100k retractions: http://openretractions.com/
we only know about retractions and other updates that publishers have properly reported to CrossRef or PubMed. That's currently 114596 papers.
I see only a couple thousand retracted "releases" in fatcat today. We do import from crossref and pubmed, so in theory we should have comparable numbers, but we don't run updates automatically yet, so if most of these are from the past couple years we are probably missing them. Also there might be bugs in our crossref and pubmed importers. I don't think we have tests for that code path, so a good first contribution would be adding tests for both crossref and pubmed retractions.
There is a database of retracted papers at: http://retractiondatabase.org/RetractionSearch.aspx?&AspxAutoDetectCookieSupport=1
It would be good to have a bot which periodically fetches updates, and then updates article metadata in fatcat appropriately.