gorakhargosh / watchdog

Python library and shell utilities to monitor filesystem events.
http://packages.python.org/watchdog/
Apache License 2.0
6.62k stars 697 forks source link

PollingObserver drops files #813

Open esemwy opened 3 years ago

esemwy commented 3 years ago

I'm using the polling observer, because the eventual implementation will need to watch an NFS mount, but I'm already dropping files when I synthetically generate data. I've stripped down my code to the bare minimum, but still up to half the files get ignored.

I'm including my stripped down example and my test file generator. The generator creates lorem ipsum text of a couple KB. Suggested workarounds would be welcome.

I'm running 2.1.3 from pypi.

filehandler.py

from watchdog.observers.polling import PollingObserverVFS as Observer
from watchdog.events import FileSystemEventHandler as Handler
import os, sys, time, logging
import traceback, time
from pathlib import Path

logger = logging.getLogger('filewatcher')

class BaseHandler(Observer, Handler):
    def __init__(self, directory):
        self._directory = directory
        super().__init__(stat=os.stat, listdir=os.listdir, polling_interval=1)
        self.schedule(self, directory, recursive=False)

        root = logging.getLogger()
        root.setLevel(logging.DEBUG)

        fmt = '%(name)s[%(process)s]: %(levelname)-5s | %(message)s'
        formatter = logging.Formatter(fmt=fmt)

        handler = logging.StreamHandler(sys.stderr)
        handler.setLevel(logging.DEBUG)
        handler.setFormatter(formatter)

        root.addHandler(handler)

    def start(self):
        for name in os.listdir(self._directory):
            path = os.path.join(self._directory, name)
            logger.debug(f'Handling {path} before start.')
            self.handle_file(path)
        super().start()

    def on_created(self, event):
        if event.is_directory or event.is_synthetic:
            logger.warn('Unexpected file %s', event.src_path)
            return
        logger.debug(f"{event.src_path} created")
        self.handle_file(event.src_path)

class HandleFiles(BaseHandler):
    def handle_file(self, source):
        logger.info('removing file %s', source)
        os.unlink(source)

def main():
   handler = HandleFiles('/tmp/newfiles')
   handler.start()
   try:
       while True:
            # Check in on the status of the directory being observed
            time.sleep(10)
   except:
       for line in traceback.format_exc().split('\n'):
           logger.critical(line)
       pass
   finally:
       handler.stop()
       handler.join()

if __name__ == "__main__":
    main()

genfiles.py

#!/usr/bin/env python3
from uuid import uuid4 as UUID
from lorem import paragraph
from pathlib import Path
from itertools import islice
import argparse, os

def writable_dir(name):
    p = Path(name)
    if not p.exists():
        raise argparse.ArgumentTypeError("{0} is not a valid path".format(p))
    if not p.is_dir():
        raise argparse.ArgumentTypeError("{0} is not a directory".format(p))
    if not os.access(p, os.R_OK|os.W_OK):
        raise argparse.ArgumentTypeError("{0} permission denied".format(p))
    return str(p)

parser = argparse.ArgumentParser('gendata')
parser.add_argument('directory', type=writable_dir, help="Directory to drop files")
parser.add_argument('--number','-n', default=100, type=int, help="Number of random files")
args = parser.parse_args()

dir = Path(args.directory)
for _ in range(args.number):
    filename = dir / str(UUID())
    with filename.open(mode='w') as outfile:
        for p in islice(paragraph(10),10):
            print(p, end="\n\n", file=outfile)
tonysepia commented 2 years ago

@esemwy Thank you for performing this test! I always check the open issues before using the library, so I have tried to reproduce your errors on Windows with simple PollingObserver. Running file generator at 10,000 the script seems to leave no files in the folder - they all get deleted. What parameters did you use?

tonysepia commented 2 years ago

Just tried with 125,000 files generated. None left behind!

esemwy commented 2 years ago

It’s been a couple versions since I opened the issue. I will say, I was running on Linux, not Windows. Otherwise, everything was configured as you see above. Our solution was to rename files into place to avoid any possible race condition.

tonysepia commented 2 years ago

Thank you for clarifying!