zerebubuth / openstreetmap-changeset-replication

Simple Ruby script to replicate changes to changeset metadata to a stream.
7 stars 0 forks source link

Changeset replication stopped #7

Closed pa5cal closed 4 years ago

pa5cal commented 4 years ago

The changeset replication stopped yesterday evening again. Latest state file has been created at: last_run: 2020-09-07 23:42:01.730936000 +00:00 sequence: 4095906

See: https://planet.openstreetmap.org/replication/changesets/

zerebubuth commented 4 years ago

Yeah, I woke up to an inbox full of error messages! I wonder if there's something that has a tendency to cause this around midnight UTC (CPU or disk I/O pressure from other cron jobs maybe?) or whether it's just a coincidence.

Thanks for reporting! It should be running again now.

pa5cal commented 4 years ago

Thx for fixing.

mmd-osm commented 4 years ago

What did the error message say?

zerebubuth commented 4 years ago
ERROR: Bad file descriptor @ fptr_finalize_flush - /store/planet/replication/changesets/state.yaml.tmp

That's the first error message from last night. It varies, but it's usually something like that. Subsequent error messages said:

ERROR: undefined method `[]' for nil:NilClass

Which means that the state.yaml file was empty. Therefore, somehow there was an issue writing out the temporary state file on one run, which left the state file empty. When I logged into the machine, the temporary state file was not empty. However, the only place in the code which modifies the state file copies it from the temporary state file.

So I have no idea how it gets into this state - and on a fairly regular basis!

I had thought it was concurrent modifications, but unless I wrote the big flock around the whole program totally wrong, then there should only be one copy of this code running at any one time. Perhaps theres some file writing stuff that's only being run at GC time, despite all the writes being wrapped in file blocks?