rmayr / dvcs-autosync

dvcs-autosync is a project to create an open source replacement for Dropbox/Wuala/Box.net/etc. based on distributed version control systems (DVCS). It offers nearly instantaneous mutual updates when a file is added or changes on one side but with the added benefit of (local, distributed) versioning and does not rely on a centralized service provider.
http://www.mayrhofer.eu.org/dvcs-autosync
Other
98 stars 14 forks source link

dvcs-autosync incorrectly records changes on a file `4913` #5

Open Dieterbe opened 13 years ago

Dieterbe commented 13 years ago

Here's my config (pretty much default) cat .config/autosync/pim [autosync] path = ~/.local/share/pim pidfile = ~/.cache/autosync.pid syncmethod = xmpp pulllock = conservative readfrequency = 5 ignorepath = .git .svn .hg

[dvcs]
statuscmd = git status | grep -iq "nothing to commit"
addcmd = git add %s
rmcmd = git rm %s
modifycmd = git add %s
movecmd = git rm %s 
    git add %s
startupcmd = git add -A
commitcmd = git commit -m "Autocommit"
pushcmd = git push
pullcmd = git pull
remoteurlcmd = git config --get remote.origin.url

[xmpp]
username = s@jabber.org
password = mypass

notice that in the target directory ~/.local/share/pim, there is no file 4913, nor any file containing the string 4913 or whatever: $ find ~/.local/share/pim -name '4913' $ grep -R 4913 ~/.local/share/pim $

Yet, I have it quite often, when I change any file, dvcs-autosync detects a change on the file 4913, tries to commit it, which obviously fails. Here is the output of an autosync session where I 1) start the program (with the config above), remove 12 lines from the file dvcs-autosync in the watched directory.

http://pastie.org/1778450 Interestingly, it receives a coalesce event for the file dvcs-autosync, then for the file 4913 then again for dvcs-autosync

In FileChangeHandler::_filter_and_handle_actions(self, args) the args variable for the relevent call is this: DEBUG:root:['/home/dieter/.local/share/pim/4913', ['/home/dieter/.local/share/pim/4913']]

installed software: pyinotify 0.9.1-1 python2 2.7.1-7 I have this issue with both the develop and master branches, in fact I always had this issue with either version.

Dieterbe commented 13 years ago

Hmm it seems that vim is to blame. vim does this to provide the user desired behavior, so it canot be expected for vim users to disable this feature (if disabling it is even possible). Vim actually tries to create the file, starting with filename 4913, and adding 123 until it finds an available filename.

https://wiki.hpcc.msu.edu/display/Issues/4913+files+created+by+vim+and+home+directory+file+locking+issue http://groups.google.com/group/vim_dev/browse_thread/thread/b29a5da44971638e/6656f4c73f38ceea?pli=1 https://bugzilla.redhat.com/show_bug.cgi?id=427711

I wonder what the right approach here should be:

rmayr commented 13 years ago

(Hopefully) fixed with 82b4cc8cddbb88086a8eff55772b8a1cb5e762d4.

Dieterbe commented 13 years ago

I can confirm your patch goes in the branch and returns from the process_IN_CREATE() function, when event.pathname no longer exists.

however, this does not cause aborting the "git add" command. dvcs-autosync starts the "git add" which fails. You can verify this by setting the loglevel to logging.DEBUG, tailing the logfile, and then typing something like touch foobarbaz4 && rm foobarbaz4 in the monitored directory.

Furthermore, there are also race conditions. If you make the function like this:

    def process_IN_CREATE(self, event):
        # sanity check - don't add file if it (no longer) exists in the file system!
        logging.debug("IN_CREATE : got create for path %s", event.pathname)
        logging.debug("IN_CREATE: CWD %s", os.getcwd())
        if not os.path.exists(event.pathname):
            logging.debug('IN_CREATE: Ignoring file create event on %s, as it (no longer) exists - it was probably created as a temporary file and immediately removed by the application', event.pathname)
            return
        import time
        time.sleep(1)
        if not os.path.exists(event.pathname):
            logging.debug('IN_CREATE: NEEDEDSLEEP Ignoring file create event on %s, as it (no longer) exists - it was probably created as a temporary file and immediately removed by the application', event.pathname)
            return

        self._queue_action(event, cmd_add, [event.pathname])

then you can some 'NEEDEDSLEEP' entries in the log. This means that with the unmodified code, it notices the file exists, queues the cmd_add action, but the file gets removed shortly after.

(To clarify: it happens that the cmd_add is scheduled and a "git add" gets executed after either of the os.path.exists branches executed successfully, i.e. no matter if it was the first (normal) one or the second one that demos the race condition)

Dieterbe commented 13 years ago

Thinking a bit about this further, I would say the actual "check if file still exists" check should not happen at this point, but rather at the point where the actual cmd_add is about to get executed. Because there is a big gap in time between these two which allows race conditions. But even this suggested approach is vulnerable to race conditions, although arguably these will be less likely, the files will probably have settled down after a few seconds, but still no guarantees. Maybe we should make the default cmd_add git add --ignore-missing

Dieterbe commented 12 years ago

hmm --ignore-missing only works in dry run mode. maybe --ignore-errors ?

ngoonee commented 11 years ago

Still an issue, and ignore-errors doesn't help because at the time of the add the file just does not exist. The error is pathspec '4913' did not match any files, which I think cannot be ignored by git

EDIT: actually something surprising is that the name '4913' in .gitignore isn't respected. I already have .swp there and .swp files do not cause a problem... some problem in the ignore?

EDIT2: Putting the gitignore pattern of *4913 works, but this is suboptimal (some files MAY end in 4913, after all). The pattern matching logic should probably be improved slightly, but for now I can use this workaround.