hirmeos / altmetrics

Implementation of HIRMEOS WP6
MIT License
5 stars 0 forks source link

Check RawEvents for potential integrity errors before create new entries #27

Closed rowan08 closed 5 years ago

rowan08 commented 5 years ago

Background: Because the Crossref Event Data API filters events by date; new events will show up twice if they happen before 12:00 (when metrics are pulled). E.g. if an event occurs at 09:00 on the 16th of Feb, and the last scrape was on the 15th of Feb, then this new event will be recorded. The next day, when the scrape happens again, looking for events that happened since the 16th of Feb, the same event will be returned again, which causes an integrity error when trying to save this in the database.

Solution: The Crossref Event Data plugin needs to check for existing RawEvent entries before trying to create them - since this is done with a bulk-create, and using 'from-collected-date' is not sufficient for this. This will also allow for fresh re-scraping in future, without causing problems, if we ever want to do this.