gorakhargosh / watchdog

Python library and shell utilities to monitor filesystem events.
http://packages.python.org/watchdog/
Apache License 2.0
6.52k stars 695 forks source link

Watchdog not detecting rsync changes #266

Open xraymancouk opened 10 years ago

xraymancouk commented 10 years ago

I am probably implementing something stupidly or incorrectly here but for me watchdog dies not detect changes on a directory which is being updated by rsync & cron.

rsync -rvuog /mnt/data /data/

Where mnt/data is a cifs mount & /data is on an ext4 file system.

Fedora 20 3.15.6-200.fc20.x86_64 #1 SMP Fri Jul 18 02:36:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Watchdog installed via pip version 0.8.0

Of I touch or copy files to the destination path /data watchdog notifies of create & mod. If I monitor with inotifywait -m it records: inotifywait -m * Setting up watches. Watches established. 20140808-2.CSV OPEN 20140808-2.CSV CLOSE_NOWRITE,CLOSE 20140808-2.CSV ATTRIB 20140808-2.CSV DELETE_SELF 800.raw OPEN 800.raw CLOSE_NOWRITE,CLOSE 800.raw ATTRIB 800.raw DELETE_SELF

And I can timestamp & file size change but nothing via watchdog.

Touch shows:

545.raw OPEN 545.raw ATTRIB 545.raw CLOSE_WRITE,CLOSE 545.raw OPEN 545.raw CLOSE_NOWRITE,CLOSE I have tried both from watchdog.observers import Observer And from watchdog.observers.polling import PollingObserver as Observer To work with from watchdog.events import PatternMatchingEventHandler.

Is this related to the issue observed with vim at some fundamental level?

xraymancouk commented 10 years ago

Sorry about the typos.

Also to clarify touch or cp into the watched path does work. It is only the rsync changes not being detected.

tamland commented 10 years ago

Can you post your code? If you're using PatternMatchingEventHandler the events will obviously be filter based on the pattern.

maljac commented 9 years ago

Same problem here:

`rsync -a /src /target`

doest not trigger watchdog. I used the PatternMatchingEventHandler from watchdog==0.8.3. cp and mv work, but not rsync.

Code is quite basic, nothing special, the only hook that I implemente is on_created.

WilliamDEdwards commented 5 years ago

Same issue here. watchdog detects all events nicely, except when they're done with rsync.

Relevant code:

#!/usr/bin/python3

from __future__ import print_function
import time
import sys
import os
import subprocess
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler

class CrtHandler(PatternMatchingEventHandler):
    patterns = ["*.pem"]

    def process(self, event):
        """removed"""

    def on_modified(self, event):
        self.process(event)

    def on_created(self, event):
        self.process(event)

    def on_deleted(self, event):
        self.process(event)

if __name__ == '__main__':
    path = '/etc/haproxy/http-certs'
    hapxconfig = '/etc/haproxy/haproxy.cfg'

    args = sys.argv[1:]
    observer = Observer()
    observer.schedule(CrtHandler(), path=path)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()

    observer.join()
WilliamDEdwards commented 4 years ago

Why has this been closed?

BoboTiG commented 4 years ago

Oups, I batch closed all issues with that were missing details. Did not see your last comment.

BoboTiG commented 4 years ago

When using rsync, do you have any output?

WilliamDEdwards commented 4 years ago

Output where?

JadeMatrix commented 4 years ago

I appear to have run into this issue, too.

I initially noticed this on FreeNAS 11.2 U5, where I'm running a Python script using watchdog 0.9.0. The observer class is watchdog.observers.Observer. When I use scp to copy a specific file to the server, watchdog reports multiple modifications of the file. This being perfectly understandable, I tried switching to rsync, but watchdog simply reports the file being deleted a single time. When trying to rsync the same file, no events are reported; the only fix is to restart the script. scp still works.

To diagnose this, whipped up a barebones version of my script as minimal reproducible example (needs to be passed a filename):

import watchdog.observers
import watchdog.observers.polling
import watchdog.events

import pathlib
import sys
import time

watched_item = pathlib.Path( sys.argv[ 1 ] )

class EventHandler( watchdog.events.FileSystemEventHandler ):

    def __init__( self ):
        watchdog.events.FileSystemEventHandler.__init__( self )

    def on_any_event( self, event ):
        if pathlib.Path( event.src_path ) == watched_item:
            print( repr( event ) )

observer = watchdog.observers.Observer()
# observer = watchdog.observers.polling.PollingObserver()
observer.schedule(
    EventHandler(),
    watched_item.parent.as_posix()
)
observer.start()

try:
    while True:
        time.sleep( 10 )
except KeyboardInterrupt:
    observer.stop()
    observer.join()

I got the following results:

Note these tests were performed with local files. Using a fresh virtualenv, my script still displays its odd behavior (single delete event on rsync then nothing). I suspect this may be due to it being an rsync over the network, or possibly due to permissions, or because it's running in a jail.

EDIT: Forgot to mention, but my specific problem is more-or-less solved by upgrading from 0.9.0 to 0.10.2 and using PollingObserver; I just wanted to provide more information here.

pmvreeswijk commented 4 years ago

I ran into this issue as well using python 3.8 and watchdog 0.10.2 and noticed rsync first creates a temporary file with the name: ".[filename].[some additional extension]" and when finished syncing, this is renamed/moved to [filename]. Watchdog does detect the temporary file, but one needs to be aware of this default behavior of rsync. Note that rsync with the option --inplace creates [filename] straight away.

mozhemeng commented 4 years ago

I have this problem too. I use python3.7.4 and watchdog 0.10.2. Watchdog can only detect the temporary file of rsync process, but ignore the correct file I need to handle. Is there any way to resolve this problem?

deafmute1 commented 3 years ago

Hi, here's how I dealt with rsync. Its not so much a bug, but an unfortunate interaction between how watchdog and rsync work.

First of all, you need to discard those events for the temp file, because you almost certainly don't want to process them. I already have a wait_on_file_transfer() function that I call when handling an event, before passing the file on to be processed in order to prevent from handling a file still being transferred. It works fairly simply by comparing file size until it is stable. I added functionality here that dealt with temp files by a) catching FileNotFoundError whilst waiting from transfer to finish, as rsync cleaning up these temp files would cause this exception and b) breaking out of my waiting loop when the file path no longer contained a file (i.e. it had been deleted by rsync due to be a temp file) then informing the caller (the handler) to discard this file instead of processing it. Here is that function:

def wait_on_file_transfer(file: Path) -> bool:
    size2 = -1
    logging.debug("IMPORT: Waiting on {} to finish transfering".format(file))
    time_start = time.time()
    while file.is_file():
        if (time.time() > time.start + 60 * config.TRANSFER_TIMEOUT): 
            logging.info("IMPORT: Timeout reached whilst waiting for {} to transfer".format(file))
            return False 
        try:
            size1 = file.stat().st_size
            if size1 == size2:
                break
            time.sleep(2)
            size2 = file.stat().st_size 
        except FileNotFoundError: 
            # avoid exceptions when calling stat() on a temp file (such as created by wget, rsync etc)
            continue
    if file.is_file():
        logging.debug("IMPORT: Transfer for file {} finished".format(file))
        return True 
    else:
        logging.debug("IMPORT: File {} deleted while waiting for transfer".format(file))
        return False # inform caller if file was deleted during transfer (likely to be a temp file)

The second thing to note, is that a move event is a little different from every other event, as it directly inherits from a FileSystemMovedEvent instead of a FileSystemEvent. Due to the way rsync works (unless using --inplace), the actual relevant file creation event is actually a move event - the create event is on the temp file, which is moved to the actual destination when the file transfer has finished.

A FileSystemMovedEvent is a combination of two InotifyEvents, a IN_MOVED_FROM and a IN_MOVED_TO; unless you create an observer like this: Observer(generate_full_events=True), watchdog combines these two events into one with the path of IN_MOVED_FROM becoming event.src_path and IN_MOVED_TO becoming event.dest_path. There are two ways to solve the issue here then; a) disable combination of move events and handle both or b) handle move events with this in mind. For me as I am using watchdog to basically deal with new files in a directory, I only really cared about the path of IN_MOVED_TO so the simplest method is to deal with event.dest_path instead of event.src_path for move events.

earonesty commented 2 years ago

note: the polling observer, if you're using it, can't see changes if they happen in the same second