tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.7k stars 5.5k forks source link

autoreload: Improve performance with many files (watchdog plugin) #2680

Open RoelWKramer opened 5 years ago

RoelWKramer commented 5 years ago

Currently when running tornado with autoreload, especially on VM's, the CPU usage is extremely high. For example:

We are running our development environment with docker. It loads around 400 files. It does a stat on every file and it is causing 100% cpu load on one of my cores. I am not using the latest mac book (2015) but my colleagues are using more recent models, which also suffer from high cpu usage. The load gets higher, on more cores, especially when running a few microservices in containers.

The CPU load is caused by the way autoreload is implemented. It just does a 'stat' on all imported python files every N seconds. There are much better ways to do that, for example: mechanisms like inotify, which is available on linux. Or use something like watchman, which supports many modes and has proven to work with a huge amounts of files. Watchman is used in javascript land quite a lot and can cope with file changes in huge projects. It can watch (parts of) node_modules/ for example. Additionally django is using watchman now as well.

Note: Inotify is not cross platform, stat is working on all platforms i guess. Different platforms have different better solutions. Watchman covers that, but introduces a lot of dependencies.

What would be nice is if there was a way of choosing/configuring autoreload mechanisms. The "old" stat autoreload plugin can then be used by default. Custom autoreload plugins can then be implemented by developers, or different plugins can be shipped together with tornado.

So the proposed solution is two fold:

What i hope that happens afterward by the community

Currently web.py already has a configuration for enabling autoreload (self.settings.get("autoreload")). What could be added is an option which defaults to the stat plugin loader ("autoreload_stat") or if a setting is available tries to load that plugin. self.settings.get("autoreload_plugin", default="autoreload_stat")

Side note: When going through the tickets i also discovered this issue, which can be solved with a (3rd party) watchman plugin i think.

bdarnell commented 1 year ago

Or use something like watchman, which supports many modes and has proven to work with a huge amounts of files. Watchman is used in javascript land quite a lot and can cope with file changes in huge projects. It can watch (parts of) node_modules/ for example. Additionally django is using watchman now as well.

Be careful: watchman is a javascript package, but there is no python watchman package (well, there is one, but it's a placeholder). django-watchman exists, but it does something unrelated to watching changes to source files. The most popular python package that does this is called watchdog (and it's not very asyncio-friendly).

In any case, I've been kind of sitting on this hoping that the python standard library adopts some sort of abstraction here, but it hasn't happened. And watchdog is complicated enough that I wouldn't want to try to duplicate it. So it seems that either a new dependency on watchdog or a plugin interface is the way to go. Tornado has always been dependency-light so I'm loathe to introduce a new one here, although introducing a new plugin interface doesn't seem great either (there's not much tornado-specific here so it feels like new plugin interfaces should be aimed more at asyncio than tornado).

Another simple option would be to tweak the callback timer. Instead of a synchronous PeriodicCallback, use a fixed sleep between cycles or something like that so that falling behind doesn't lead to 100% cpu utilization (but it's not a great solution since it makes things less responsive if you have a big project on a slow filesystem).