We're using Listen through ActiveSupport::EventedFileUpdateChecker on a large application, and we're seeing a big discrepancy between running on Linux and macOS.
The main reason I found is that rb-fsevent uses a different process to do the file watching with which it communicates through a pipe.
On the other hand, rb-inotify does syscalls through ffi directly. In our case, recursing over all the subdirectories in Ruby takes a long time, about 1.3 seconds. By comparison, the same call to Listen::Listener#start takes less than 2 milliseconds on macOS.
I've read through the Performance and Tips and Techniques sections of the README, however our issue is that we just have a lot of reloadable code in a lot of directories. This is after reducing the watched paths to a minimum already.
In this PR, I've introduced a new adapter, ProcessLinux, which instead of listening directly, forks to a process with the directories as arguments, and listens to the pipe. I've added a way to use it by adding a backend selecting option prefer_fork.
It kinda works, however it doesn't fit in the current architecture very well. For example, the adapter doesn't go through the transitions properly. Listener#start returns before events can actually be processed.
Also it doesn't handle pausing currently.
More importantly, deletions don't work for some reason. The calling process receives deletion events but they don't turn into a call to the callback.
Also we're still spending time creating all the snapshots in the calling process, even though I'm not sure we need this. Is the snapshot feature only necessary for pausing/restarting?
Anyway I've created a small repro script to show the issue around the directory recursion.
Repro script
```ruby
require "fileutils"
require "benchmark"
require "etc"
# Create 4096 directories (16*16*16)
FileUtils.rm_rf("tmp")
digits = (0..9).to_a + ('a'..'f').to_a
dir = nil
digits.product(digits, digits, digits) do |a,b,c|
dir = "tmp/#{a}/#{b}/#{c}"
FileUtils.mkdir_p(dir)
end
minor_before, major_before = GC.stat.values_at(:minor_gc_count, :major_gc_count)
fork = ARGV.delete("--fork")
GC.disable if ARGV.delete("--disable-gc")
profile = ARGV.delete("--profile")
allocations = ARGV.fetch(0) { 1_000 }.to_i
require "bundler/inline"
gemfile do
gem "listen", path: "."
gem "vernier" if profile
end
before_ts = Process.clock_gettime(Process::CLOCK_MONOTONIC)
if profile
require "vernier"
Vernier.start_profile(out: "#{Etc.uname[:sysname]}#{fork}.json")
end
queue = Queue.new
listener = Listen.to("tmp", { prefer_fork: fork }) do |m, a, r|
queue.close
end
Thread.new do
print "listener.start: "
puts Benchmark.measure { listener.start }
sleep
end
touches = nil
Thread.new do
loop.with_index do |_, index|
touches = index
FileUtils.touch("#{dir}/g")
allocations.times { +"" }
sleep 0.1
end
end
queue.pop
listener.stop
Vernier.stop_profile if profile
after_ts = Process.clock_gettime(Process::CLOCK_MONOTONIC)
puts "File updated #{touches} times before callback"
minor, major = GC.stat.values_at(:minor_gc_count, :major_gc_count)
p(major: major - major_before, minor: minor - minor_before)
puts "elapsed: %10.6f" % [after_ts - before_ts]
```
I've also added some allocations because we're also seeing a lot of GC time while doing the directory recursion.
### Results
```console
$ podman run -ti --rm -v $PWD:/workdir -w /workdir ruby:3.3.3 bash
root@d542bd460ef9:/workdir# ruby test.rb --profile
listener.start: 0.094773 0.143460 0.238233 ( 0.898538)
File updated 21 times before callback
{:major=>9, :minor=>42}
elapsed: 2.238860
root@d542bd460ef9:/workdir# ruby test.rb --profile --fork
listener.start: 0.000231 0.000285 0.000516 ( 0.000746)
File updated 32 times before callback
{:major=>7, :minor=>38}
elapsed: 3.291573
```
Here are profiles: the Linux adapter, the forking Linux adapter, and for an unfair comparison the Darwin adapter (unfair because running on the machine while the others run in a container in a VM).
These can be opened using https://vernier.prof/. Unfortunately I can't link there directly because CORS.
We can see that the inotify implementation is still faster to receive the first event, but the call to listener.start is very slow (900ms).
To test this, for now I've just changed the acceptance test to prefer forking, this shows a few failures:
I think the forking implementation shouldn't be necessary, and we could fix rb_inotify to do all the directory recursion in a C extension that releases the GVL. That should be enough to be able to start a listener in a thread without blocking the rest of the Ruby code. However I don't know how to do that, so this is my other attempt, just using forking to work around the GVL contention issue.
Can you take a look and evaluate whether the adapter selecting option to choose between different available adapters for a given platform would be acceptable?
Can you provide pointers as to how to fix the issue I'm seeing with deleted files? And let me know if having a way to disable snapshots would make sense too, or a way to improve the performance of that feature.
We're using Listen through
ActiveSupport::EventedFileUpdateChecker
on a large application, and we're seeing a big discrepancy between running on Linux and macOS. The main reason I found is thatrb-fsevent
uses a different process to do the file watching with which it communicates through a pipe. On the other hand,rb-inotify
does syscalls throughffi
directly. In our case, recursing over all the subdirectories in Ruby takes a long time, about 1.3 seconds. By comparison, the same call toListen::Listener#start
takes less than 2 milliseconds on macOS.I've read through the Performance and Tips and Techniques sections of the README, however our issue is that we just have a lot of reloadable code in a lot of directories. This is after reducing the watched paths to a minimum already.
In this PR, I've introduced a new adapter,
ProcessLinux
, which instead of listening directly, forks to a process with the directories as arguments, and listens to the pipe. I've added a way to use it by adding a backend selecting optionprefer_fork
.It kinda works, however it doesn't fit in the current architecture very well. For example, the adapter doesn't go through the transitions properly.
Listener#start
returns before events can actually be processed. Also it doesn't handle pausing currently. More importantly, deletions don't work for some reason. The calling process receives deletion events but they don't turn into a call to the callback. Also we're still spending time creating all the snapshots in the calling process, even though I'm not sure we need this. Is the snapshot feature only necessary for pausing/restarting?Anyway I've created a small repro script to show the issue around the directory recursion.
Repro script
```ruby require "fileutils" require "benchmark" require "etc" # Create 4096 directories (16*16*16) FileUtils.rm_rf("tmp") digits = (0..9).to_a + ('a'..'f').to_a dir = nil digits.product(digits, digits, digits) do |a,b,c| dir = "tmp/#{a}/#{b}/#{c}" FileUtils.mkdir_p(dir) end minor_before, major_before = GC.stat.values_at(:minor_gc_count, :major_gc_count) fork = ARGV.delete("--fork") GC.disable if ARGV.delete("--disable-gc") profile = ARGV.delete("--profile") allocations = ARGV.fetch(0) { 1_000 }.to_i require "bundler/inline" gemfile do gem "listen", path: "." gem "vernier" if profile end before_ts = Process.clock_gettime(Process::CLOCK_MONOTONIC) if profile require "vernier" Vernier.start_profile(out: "#{Etc.uname[:sysname]}#{fork}.json") end queue = Queue.new listener = Listen.to("tmp", { prefer_fork: fork }) do |m, a, r| queue.close end Thread.new do print "listener.start: " puts Benchmark.measure { listener.start } sleep end touches = nil Thread.new do loop.with_index do |_, index| touches = index FileUtils.touch("#{dir}/g") allocations.times { +"" } sleep 0.1 end end queue.pop listener.stop Vernier.stop_profile if profile after_ts = Process.clock_gettime(Process::CLOCK_MONOTONIC) puts "File updated #{touches} times before callback" minor, major = GC.stat.values_at(:minor_gc_count, :major_gc_count) p(major: major - major_before, minor: minor - minor_before) puts "elapsed: %10.6f" % [after_ts - before_ts] ``` I've also added some allocations because we're also seeing a lot of GC time while doing the directory recursion. ### Results ```console $ podman run -ti --rm -v $PWD:/workdir -w /workdir ruby:3.3.3 bash root@d542bd460ef9:/workdir# ruby test.rb --profile listener.start: 0.094773 0.143460 0.238233 ( 0.898538) File updated 21 times before callback {:major=>9, :minor=>42} elapsed: 2.238860 root@d542bd460ef9:/workdir# ruby test.rb --profile --fork listener.start: 0.000231 0.000285 0.000516 ( 0.000746) File updated 32 times before callback {:major=>7, :minor=>38} elapsed: 3.291573 ```Here are profiles: the Linux adapter, the forking Linux adapter, and for an unfair comparison the Darwin adapter (unfair because running on the machine while the others run in a container in a VM).
These can be opened using https://vernier.prof/. Unfortunately I can't link there directly because CORS.
We can see that the inotify implementation is still faster to receive the first event, but the call to
listener.start
is very slow (900ms).To test this, for now I've just changed the acceptance test to prefer forking, this shows a few failures:
I think the forking implementation shouldn't be necessary, and we could fix
rb_inotify
to do all the directory recursion in a C extension that releases the GVL. That should be enough to be able to start a listener in a thread without blocking the rest of the Ruby code. However I don't know how to do that, so this is my other attempt, just using forking to work around the GVL contention issue.Can you take a look and evaluate whether the adapter selecting option to choose between different available adapters for a given platform would be acceptable? Can you provide pointers as to how to fix the issue I'm seeing with deleted files? And let me know if having a way to disable snapshots would make sense too, or a way to improve the performance of that feature.
Thanks!