graphite-project / carbonate

Utilities for managing graphite clusters
MIT License
516 stars 80 forks source link

flock #19

Closed earthgecko closed 7 years ago

earthgecko commented 10 years ago

Great effort, whisper can be a pain.

I have always been led to believe that "Graphite will lock the Whisper file using flock() during updates so that no other Graphite process reads it mid-update and finds inconsistent data.

Unfortunately, flock() is only an advisory lock and rsync ignores it. This means that if Graphite updates a particular Whisper file while rsync is in the middle of backing it up, the backup copy might be inconsistent." from the boys SwiftStack - @smerritt

Having looked through the code I do not see any reference to flock in anyway. Is this just trivial because of the context of healing to local whisper file? If so, adding an ssh key option pull request is in order for me I think.

jssjr commented 10 years ago

Unfortunately, flock() is only an advisory lock and rsync ignores it. This means that if Graphite updates a particular Whisper file while rsync is in the middle of backing it up, the backup copy might be inconsistent.

Really interesting observation. And one that I hadn't considered.

If rsync is ignoring flock() on specific files, then it seems reasonable that the file can be transmitted in an inconsistent state. However, since whisper will wrap the update transaction in a lock and the sync job will rsync the source file to /tmp on the target host in a batch, then merge the values into the target database, this risk only affects us if whisper is inside a create or update transaction when rsync reads and sends the file.

I'm honestly not sure how to solve this and continue to use rsync, if rsync isn't willing to respect a shared lock. Do you have any ideas here?

Removing the rsync dependency and having carbon-sync use the carbonlink protocol directly to pull data is an option, but one that will require significant work and tuning.

earthgecko commented 10 years ago

Well if there was a pip libflockit (https://github.com/smerritt/flockit) it could be added to requirements :) Or perhaps something similar with python fcntl.flock()

On either end at the moment we do: flockit.so rsync .wsp file to tmp location -> rsync .wsp to remote tmp location -> flockit.so rsync tmp_remote .wsp to pyschical .wsp file

env LD_PRELOAD=/usr/lib/libflockit.so FLOCKIT_FILE_PREFIX=$WHISPER_PATH rsync $OPTION_THINGS <src> <dest> would do it all.

But seeing as the file on the dest is being replaced and healed in-situ locally there should be no need to use libflockit.so on the dest unless it is a file that carbon is updating via a relay, etc (e.g. exists).

jssjr commented 10 years ago

Are you suggesting we apply a shared lock around the batch of files to sync in https://github.com/graphite-project/whisper/blob/master/whisper.py#L540-L542? I worry about that causing problems by blocking the local carbons, but with a sufficiently small batch size it might be fine.

earthgecko commented 10 years ago

Yeah we flock rsync per file. Figured it was the only way to ensure the we are not creating a Schrödinger's cat type scenario, without validating every timeseries data set somehow.

jssjr commented 10 years ago

I'd be OK with that, but I wonder if it should be default. I'm still worried that since we transfer files in a batch of 1000 at a time to avoid the overhead of setting up the connection for every file we'll cause the carbon processes on the source node to block for an unreasonable amount of time. I'm unsure how to handle this.

jssjr commented 10 years ago

Want to send over a PR with your ideas? We can riff on that and get this solved.

earthgecko commented 10 years ago

Forked, no quick promises though, have libcloud, fog and skyline pull requests that need to be done that are lagging. This social coding malarkey, who thought that was a good idea ;)

filippog commented 9 years ago

hi, thanks for carbonate! I was thinking about locking too, related to this whisper.LOCK seems to be set by carbon when WHISPER_LOCK_WRITES is set in graphite's config (see https://github.com/graphite-project/whisper/issues/97) however carbonate doesn't explicitly lock if I'm reading correctly so this is also possible I think:

jssjr commented 9 years ago

@filippog - That seems accurate. I believe we want to duplicate the WHISPER_LOCK_WRITES directive in the carbonate config file, modify the import so we have access to LOCK and set it accordingly so update_many() calls flock before making changes. Does this sound right? Want to take a stab at a PR or should I?

filippog commented 9 years ago

sounds good @jssjr ! I've split this into #47 so we can followup there, I'll take a stab at a PR shortly!

deniszh commented 7 years ago

Fixed in https://github.com/graphite-project/carbonate/pull/71