Open e2 opened 10 years ago
Something like that for Listen 3.x, should be fine right?
listener = Listen.to('dir/to/listen', 'dir/to/listen2') do |changed|
puts "changed absolute path: #{changed}"
end
listener.start # not blocking
sleep
TL;DR; - scroll to the summary to see the Guardfile example using a new Listen API
There are ONLY 2 things people really want to track:
This means distinguishing between moving/deleting/renaming/adding makes no sense for listen
.
E.g. if anyone wants to be notified when files are added or removed or renamed for use cases such as:
... then what they REALLY need to do instead is track the parent/containing directory for modifications.
So if someone wants to know if dir1 contains new files, they need to watch dir1 for modifications and compare with ... well .... some previous state.
Since listen
can't do this in a reliable and predictable way to make everything work and make everyone happy, listen
shouldn't even TRY to do it at all.
... then here's a workaround to prove a point:
listen
With this workaround, there's NO reason for listen
to support detecting additions/removals, since listen
is not an incremental snapshot/backup tool.
Since we can now safely cut ourselves from even considering bring adding/removing/deleting into any listen-related discussion, let's move on to ...
Here's what a callback currently gets:
{
modified: ['dir1/foo'], # from `echo 'foo' > dir1/foo`
added: ['dir1/dir2/bar'], # from `touch dir1/dir2/bar`
removed: ['dir1/dir2/dir3/baz'] # from `rm dir1/dir2/dir3/baz`
}
This is way too problematic to describe and properly support.
Reducing the event types to "modified"
While the events are too complex, the watcher configuration isn't flexible enough.
INotify support multiple watchers per listener and there's currently no API to allow configuring different callbacks and configuration options per directory.
A new API could look like this:
listen = Listen.new(ignore: '.swp', recursive: true) #global options if any
watcher1 = listen.to('dir1') { |dir, object| ...}
watcher2 = listen.to('dir1/dir2', recursive: false, ignore!: 'dir3', wait_for_changes: 0.5) { |dir, object| ...}
watcher3 = listen.to('dir1/dir2/dir3/baz', file: true, ignore: '*.swp', wait_for_changes: 0.1) { |dir, object| ...}
In general this means:
A watcher is a adapter "instance" or "configuration" - depending on whatever makes sense for an adapter.
E.g. there will obviously only be one allowed TCP instance listening on a host/port, but it can "simulate" instances by keeping a hash of "instance configs" and their callbacks.
Notes/differences:
The first argument of the callback should the directory the change is related to and how that's matched depends on the adapter, e.g. TCP could match user@host:post prefix (mentioned below), while Linux adapter would need to match a INotify watcher's config with the associated callback).
The third option (with file: true
) is a special case to actually allow files to be watched, which can allow workarounds for capturing events not detected by watching directories.
Overall, per-watcher-configuration allows different listening options for different plugins/directories, e.g. longer polling times + ignore masks for "editing" directories vs maximum responsivity for virtual machine changed files and TCP listening, etc.
This also allows fine tuning Guard/listen based on the amount of resources, the performance improved (concurrency), and polling parameters based on how frequently files are modified in a given directory.
So, assuming the above 3 watches with callbacks, we'd get:
# callback 1
[
[ 'dir1', '' ], # from `echo 'foo' > dir1/foo`
[ 'dir1', 'foo'], # from `echo 'foo' > dir1/foo`
[ 'dir1', 'dir2'], # because of `touch dir1/dir2/bar`
[ 'dir1', 'dir2/dir3'], # because of `rm dir1/dir2/dir3/baz`
[ 'dir1', 'dir2/dir3/bar'], # because of `rm dir1/dir2/dir3/baz`
]
# callback 2
[
[ 'dir1/dir2', ''], # from `touch dir1/dir2/bar`
[ 'dir1/dir2', 'bar'], # from `touch dir1/dir2/bar`
]
# callback 3
[
# from `rm dir1/dir2/dir3/baz`
['dir1/dir2/dir3/baz', nil] # nil here means the key is a file path,
# not that the file was deleted
]
(With TCP, the adapter would prefix the keys with user@host:port, so multiple TCP broadcasters could send to a single receiver, and callbacks could be correctly handled per node).
Or, with comments:
[
# results of watching dir1 RECURSIVELY
[ 'dir1', '' ], # something about dir1 changed (attributes)
# '' - because File.join(dir1,'') => 'dir1/', while nil would give an error
[ 'dir1', 'foo' ], # Something about foo in dir1 changed
# (attributes as a result of content change)
[ 'dir1' => 'dir2' ], # Something about dir2 changed (attributes)
[ 'dir1' => 'dir2/dir3' ], # Something about dir3 changed (attributes because bar was removed)
[ 'dir1' => 'dir2/dir3/bar' ], # Something about dir3 changed, specifically - attributes for bar
]
[
# results of watching dir1/dir2 NON-RECURSIVELY
[ 'dir1/dir2', '' ], # something about dir1/dir2 changed (attributes)
[ 'dir1/dir2', 'bar' ], # Something about dir1/dir2/bar changed
# (attributes as a result of adding the file)
# Notice how value is relative to path in key
# Note: changes to (...)/dir3 and (...)/dir3/bar are ignored
]
[
# results of watching a specific file's content
[ 'dir1/dir2/dir3/bar', nil ], # Assuming dir1 is watched recursively
# note: IMHO it's ok to cause a File.join(key, value) error here
]
Since there's no conceptual difference between silencing events (listen level) and regexp matching (guard level), and since the above new API would allows options per watcher configuration...
... we could throw away pattern handling from Guard and put it in the listener, where it would make more sense, since less notifications would be needed and more fine-tuning would be possible.
This would also make things easier to debug, because removing the pattern would show all the file notifications a given guard
watch
block receives (more on that later).
With the above callback invocation, the TCP listener could also track different changes with different (local) options per directory, because it could group notifications by watched directory - effectively the TCP listener could distinguish on which node a file changed and trigger a different callback.
Here's a non-existing new API proposal for guard
to reflect the changes in the listen
API
directory('db', recursive: false) do
# for long running rake db:migrate
watch('schema') do |dir, object|
notify "migration finished!"
end
end
directory('.', ignore: %w(git db tmp log), recursive: true) do
# quick workspace backup
watch do |dir, object|
sh "rsync -arR #{File.join(dir, object)} ../copy-of-project.$(date +%s)"
end
end
directory('user1@example.org:db', wait_for_delay: 3600, on: localhost:4321) do
# Get latest list of 'must support' cases (entered by client on website)
watch('*.db-journal') do
sh 'wget -O spec/fixtures/valid_cfg.csv http://vm1/currently_used.csv'
end
end
directory('ci@server1', wait_for_delay: 30, on: localhost:4321) do
# if a team member changes the schema, we want to know ASAP
watch('db/schema') do |dir, object|
sh "scp #{File.join(dir, object)} new_schema.rb"
unless sh 'cmp new_schema.rb db/schema.rb'
notify "schema changed on master!"
end
end
# notify when ci server finishes reporting coverage stats
watch('coverage/index.html') do |dir, object|
notify "build completed and coverage data available on server1"
end
end
and on the nodes, respectively:
# in Guardfile on example.org
directory('db', latency: 10, recursive: falseignore: 'migrate', forward_to: 'localhost:4000')
# in Guardfile on server1, where ci home is /var/lib/ci
directory("/var/lib/ci/checkouts/current", latency: 10, forward_to: 'localhost:4000') do
watch('db/schema')
watch('coverage/index.html')
end
listen
can both ignoring files in 'db' for changes for some tasks (backup), while tracking it for other tasks (db/schema).directory
taking listener parameters, watch
allows for more flexible processing of parameters without ambiguity. (directory
= listener options, watch
= specific filtering passed to listen
, rules and maybe plugin options scoped to the given block.directory
can be reused with different patterns.listen
itself before choosing the right callback.The above examples show how guard could pass options to Listen
, and what the new callbacks could be like.
Here's another example.
Here too, the actual guard DSL is irrelevant - the point is separating individual adapter configurations and scoping.
directory('app', ignore: 'schema.rb') do
directory('views') { watch(%w(*.erb *.haml *.slim), :rspec, :cucumber, :livereload) }
directory(%w(models controllers helpers config lib) { watch('*.rb', :rspec, :rubocop, :livereload) }
watch('config/locales/*.yml', :cucumber, :livereload)
watch('db/**/*.rb', :migrate)
watch('spec/**/*.rb', :rspec)
directory('features') do
watch('*.feature', :cucumber)
directory(%w(support step_definitions)) { watch('*.rb', :cucumber, :rubocop) }
end
directory(%w(assets public)) do
watch('*.html', :livereload)
watch(%w(*.css *.js), :livereload, :jammit)
end
watch('Gemfile', :bunder)
end
directory('vendor/assets') { watch(%w(*.css *.js), :livereload) }
This would in total setup only 6 INotify watchers, 11 callbacks (11 filters) and without monitoring and notifying changes in directories like lib, tmp, db, log, etc. - or specifically ignoring anything in them.
Further more, since watch now specifies a pattern and event type isn't necessary, it's free for allowing parameters for e.g. local dependencies and options for plugins, etc.
So that's my proposal on how to change the API in a way helps extend and adapt to new ideas in the future without major modifications in the new API.
Sorry - I started editing earlier and took a longer break, so I didn't refresh and see your comment.
listener = Listen.to('dir/to/listen', 'dir/to/listen2') do |changed|
puts "changed absolute path: #{changed}"
end
listener.start # not blocking
sleep
The key differences with what I wrote above are:
1) The need to separate the listened directory from the pattern for the callback:
listener = Listen.to('dir/to/listen', 'dir/to/listen2') do |dir, object|
(so we're not locked to only using one listener + one adapter + one configuration + one callback per client or having to manage listeners on the client side)
2) And per-watcher configuration, so it's possible to do smarter things with many instances of adapters without breaking the API with every change.
My writing probably sucks, so feel free to skim through the examples and raise issues with them.
I should probably write less and put in more examples.
Hey @e2 thanks for your long comment, that looks pretty awesome for me, if @rymai agree too, we can go forward with that and start a PRs.
PS: If you want I can give you direct pull/push access to guard & listen repo so you can directly code from there.
PS2: your writing is pretty good for me, continue like that! :ok_hand:
Update + status:
I'm trying to safely "evolve" guard without breaking everything (which is tricky).
I've updated the above examples (my hash examples were dumb).
I decided on arrays because ... because ... because Arrays!
(from proposal above):
# callback 1
[
# [ #<Pathname:dir1>, '.' ], # from `echo 'foo' > dir1/foo`
# [ #<Pathname:dir1>, 'foo'], # from `echo 'foo' > dir1/foo`
# [ #<Pathname:dir1>, 'dir2'], # because of `touch dir1/dir2/bar`
# [ #<Pathname:dir1>, 'dir2/dir3'], # because of `rm dir1/dir2/dir3/baz`
# [ #<Pathname:dir1>, 'dir2/dir3/bar'], # because of `rm dir1/dir2/dir3/baz`
]
This is so people can simply do:
listen = Listen(options) # object to manage instances through Celluloid registry
adapter_instance = listen.to('app/views', ignore: '*.swp') do |changes|
changes.each do |watched_dir, rel_path|
puts watched_dir + rel_path # or watched_dir.join(rel_path), instead of ::File.join(watched_dir, rel_path)
end
end
This should gives the conveniences of Pathname methods for testing if the directory exists, if the path exists, if there's a symlink, use relative paths for nicer outputs, etc.
Ideally, I'd want to make Listen abstract enough to allow any changes in the form of:
[ [ signal1, object1 ], [signal2, object2], ... ]
where signal could be a:
:cucumber_all_ok
)And object would be:
I've run into a few race conditions during the tests. To avoid them with the new API (and to make good use of the new functionality), here's how the above example could be used by Guard:
# listen starts in paused mode - meaning it's listening for adapter events, but
# there are no initialized adapters yet
# (Re)Initialize watchers + start collecting notifications, setup internal watchers, etc.
# Filters are run on collected/queued events (e.g. ignore, broadcast, smoosh) and
# the callback is called with a batch of optimized/reduced files
cucumber_adapter_instance.start
# The given Guard callback is called here, which Guard knows is associated with
# cucumber plugin
# Before running non-background commands, Guard switches back to silent event
# "collecting/queuing" mode, e.g. while cucumber is running
cucumber_adapter_instance.pause
# Pause notifications, so they are collected and shown in bulk only once the
# whole triggered chain completes or fails
notifications_adapter_instance.pause
# Guard runs cucumber, cucumber notifications are collected by notifications
# adapter, user edits files
# Manually called by a `guard-auto-dependency` plugin (works like `make deps`),
# which detected a layout file changed using a different non-paused
# adapter_instance
cucumber_adapter_instance.trigger('reports/browsing.feature')
# Guard cucumber finishes here
# Show optimized/reduced/summary notifications first, so user isn't spammed
notifications_adapter_instance.resume
# Resume here triggers collected results, including 'reports/browsing.features'
cucumber_adapter_instance.resume
# If no callbacks were called for a certain period of time, the computer could
# do some useful stuff, e.g. rubocop, coverage, static code analysis on files
# changed since last run, etc.
idle_adapter_instance.resume
# pauses all adapters (probably doesn't make sense, since non-interactive
# "background" tasks could do useful stuff in idle mode)
listen.pause
Because of how everything is coupled together in Listen, my "plan of step-by-step changes" is as follows:
listen
with all plugins and adapters - or picking listen-core
with whatever subset they want) - this would close issues with package management and dependencieslisten_guard2x_compat_filter
gem with a listen filter - just like a listen_tcp_broadcast_filter
gem)Any feedback, questions, criticism, suggestions on where I could be going - all much appreciated.
[Maybe I should put this on a separate wiki page?]
Looks really good to me. I'm just not sure to understand the functionality extraction to a separate gems of 'rspec' you mention on 7. 'rspec' is something you would to extract?
I meant gems like rspec does it: rspec, rspec-core, rspec-mockes, etc.
Here's a list of ideas for gems/extensions:
The idea is that listen itself would have to be really, really minimalistic, since patches would be few and non-trivial (concurrency, Celluloid, thread-safe code, etc.).
Ok now I get the "like rspec", yeah I like that approach. Seems like a really great list!
What do you have in mind for listen-filters-livereload?
listen-filters-livereload would just allow listen to effectively work like guard-livereload ... except without the guard part.
It would be a "quick and dirty" way to temporarily watch/debug parts of a project without Gemfile+Guard+plugins+guard init + tweaking. Like guard-livereload without Guard and EventMachine.
It could be useful for adding livereload to non-ruby/non-rails projects without changing anything in the project (e.g. static sites). Though personally I'd prefer to use guard set up in every project I use.
I'm thinking that - if actually created - it would demo how to create more "server-like" adapters/filters without having to worry out your own event handling, etc. (listen-filter-tcp would be so similar).
I wouldn't be surprised if some non-interactive guard plugins (like guard-livereload) were ultimately moved to listen, where they could be more easily configured from guard. That means moving some "managing" functionality from guard ... and making room in guard for more high-level workflow related features.
But currently I have to get polling working perfectly with the new API...
Ok sounds great!
TL;DR - I'll rename classes completely (!) and prepare a PR for master - without breaking anything.
Crap ... I kept hitting dead ends trying to reliably refactor the existing code - without ending up with huge rewrites.
And the way Celluloid responds to errors doesn't help rework things.
The challenge: changing things step by step to make room for the new API, while keeping backward compatibility and keeping the version working, and while making the final changes easier to understand for potential contributors.
In short: I'd prefer to be quickly making "master branch worthy" changes and not breaking things for as long as possible.
There are lots of "hidden" aspects, like there's no need for a Record
instance on Linux at all (and keeping a database of log files, swap files, etc.) - which means the Record
is adapter specific. TCP listening doesn't need a Record
class either.
Change
is also therefore adapter-specific, and so are both Directory
and File
.
The second thing is with the current classes: Listener
, Adapter
, Silencer
, TCP Listener/Broadcaster
.
While they separate the ideas pretty well, the confusion is in:
So I think there's a way to "reorganize" things without actually breaking anything or changing the overall API:
Listener
to MessageQueue
or ChangeQueue
(because that's what it really is - and the only classes which need access to the message queue are those that actually generate filtered events - i.e. adapters)Record
, Change
, File
and Directory
into Adapter namespace (none of which are needed by TCP and Linux adapters - they should be allowed send change events directly to listener)Adapter
to Directory
(!!!) [Possibly renaming current Directory
to Fileset
or FilteredDirectory
]Listener
having @directories
and global configuration (@options
), it would have just @directories
, of which each is an Adapter
(from now on Directory
) with it's own supported (or not) configuration (e.g. latency, recursion).Listener
(from now on MessageQueue
) would delegate everything to adapters ("directories"), meaning options + ignores, except for queue processing (start/stop/pause/unpause)With the above, backward compatibility wouldn't be an issue, while the "new" API would simply be creating a Listener
(MessageQueue
) with no arguments (paths to directories) and adding directories (Directory
instances - currently called "adapters") with their respective options (ignores, latency, etc.) - allowing a separate callback per directory if the client (e.g. Guard) wants to (e.g. Guard could tie a plugin to a directory - instead of iterating over every plugin in scope).
And I think this is all doable without too many changes in the codebase or even tests. Then, with every adapter instance tied to a directory, an ugly but simple hack is all that's necessary to support the new API - possibly also closing symlink and recursion issues.
(The hack/workaround can't be done now, because events aren't separated by directory - they're mashed up together and OS-dependent).
What do you think? Would this make things less confusing for contributors or more?
I know I'm taking my 'nth stab at this, but I think I'm on the right track now.
I think that looks good. Don't hesitate to change everything for Listen 3.0 if that makes sense to you. You're doing massive change so that's normal to have massive renaming :)
I can't wait to see the new structure taking live! Thanks for your work!
Perhaps a silly request but seems that the issue matches the topic: is it possible for the new API to pass mtime and other stats?
I believe Listen
already knows these from watching the files and there are common use cases where mtime is used (like generating unique hash) and look like asking the filesystem for the stats could be skiped if they were provided by Listen
@antitoxic - if you're not sending events over a network, you can pretty much get any file info in your callback - ether by calling Ruby API directly, or use the Listen::Record for getting information from Listen's internal structures.
Over a network, the TCP::Broadcaster is really only a convenience - you should setup your own server, decide on a protocol you want, and write your own client or Listen adapter (there's no example for this, though).
Also, getting stuff from the filesystem is usually cheap (except for virtual machines and network file systems) - and in those cases, RSync is usually probably faster/cheaper anyway.
I'm looking exactly at the virtual machine scenario.
And rsync + ssh is terribly slow method to sync or at least how vagrant do it is terribly slow: https://github.com/mitchellh/vagrant/blob/master/plugins/synced_folders/rsync/helper.rb#L106
So I'm thinking Listen + notifying the vm that changes are made + clients on the vm that listen to events.
I'll have a look at Listen::Record. Yesterday I was looking at Listen::File and I saw something like path.lstats...
which I haven't thoroughly investigated. It might be what I need for mtime and other stats.
This question wasn't much about the TCP but just an observation on how Listen works and how other watchers work - they provide changed path and it's stats.
@antitoxic - yeah, rsync+ssh on Windows is probably extremely slow - you may want to try mounting a guest folder as a network folder on the host instead (and run the listen server on the VM).
Note that the Listen::Record is mostly for being able to detect which files changes actually occurred (mostly for polling and on OSX) - it's not an optimization or for consistency. In fact, it's best to check the mtime explicitly, because it might have changed by the time the client gets the message.
So having just the filename in the API is pretty much just a "hint", nothing more - mtime is even less reliable (especially on FAT and HFS).
Having spent some time with the
listen
internals, issues and thinking about solutions, rethinking the API constantly comes up.Since I don't know much about listen use cases outside the typical
guard-cucumber
andguard-rspec
usage throughguard
, I'd appreciate and feedback, caveats, issues and requests that could be affected by such API changes.I've had some epiphanies about how to think about tracking files and directories to avoid pitfalls. It's long, it's boring, but it's here: https://gist.github.com/e2/10948802