ytsapras / robonet_site

Django RoboNet operational database.
GNU General Public License v2.0
0 stars 0 forks source link

Event OGLE-2017-BLG-0586 is missing from the DB #9

Closed rachel3834 closed 7 years ago

rachel3834 commented 7 years ago

This even should have been downloaded and ingested previously but doesn't appear at URL https://robonet.lco.global/db/event/OGLE-2017-BLG-0586

rachel3834 commented 7 years ago

For future reference, issues like this can be investigated by grepping for the short-hand version of the event name (i.e. OB170586 in this case) in the artemis_subscriber logs.

This has happened because the subscriber tried to add the event but the function returned (correctly) that an event was already known to the database at those coordinates...because it was identified by MOA.

I believe these errors occurred for an earlier version of artemis_subscriber because I subsequently fixed this issue - it will now simple add a new EventName in cases like this.

The remaining problem is that it won't automatically go back and correct events previously downloaded from ARTEMiS. I suggest we resolve this with a command function that corrects the DB entries where necessary. I will implement that next.

rachel3834 commented 7 years ago

I've implemented a management command called 'check_event_cross_matching' to verify the database integrity based on all available data files from ARTEMiS - it can be run over the DB at anytime, and it will add any EventNames it finds to be missing, associating them with the correct Events.

rachel3834 commented 7 years ago

In the process of investigating this issue, I noticed a different problem: We had events which were unknown to the DB which were not picked up by the subscriber.

This seems to have happened in cases where an issue interrupts the rsync process. The subscriber was designed to rsync data over, and use the rsync record of which files were changed to decide those that needed to be updated within the DB. This allowed it to skip many files which (theoretically) hadn't changed, for speed of operation.

However, if for whatever reason a previous rsync run was interrupted before the data were ingested, this design means that the data would not be rsync'd again...and therefore never ingested.

I have therefore updated the code to review all datafiles from the current year, regardless of whether they were updated in the most recent rsync, to ensure that the DB always has a copy of all available data. This does mean the code will run for a bit longer, but speed tests from my laptop indicate a total runtime of ~7min, which is sufficient - its within the 15min cadence of the crontab call of the code. This should be monitored later on in the season when we have more events to review.

mpgh commented 7 years ago

Sorry for not mentioning that earlier. We have mentioned that yesterday, but forgot to let you know. It might be helpful to double-check if rsync really only checks current year events. An rsync time-out or failure might also lead to out of sync model and align files. These are picked uped by REA TAP using the "last updated" db entry.

rachel3834 commented 7 years ago

Thanks Markus - yes that is indeed what the updated code does.

rachel3834 commented 7 years ago

I can confirm that event OGLE-2017-BLG-0586 is now correctly cross-matched in the DB and can be viewed through: https://robonet.lco.global/db/event/OGLE-2017-BLG-0586 I've run the check_event_cross_matching code successfully over the DB, and I'm running the updated artemis_subscriber as well.

rachel3834 commented 7 years ago

I believe this should answer this issue, so closing this thread. Please continue to monitor for any DB integrity problems.