tkellogg / dura

You shouldn't ever lose your work if you're using Git
Other
4.3k stars 66 forks source link

More efficient update checking #5

Open periish opened 2 years ago

periish commented 2 years ago

Linux (and macOS) both support an event based API for file watching - perhaps this'd be more efficient than manually checking for updates? https://crates.io/crates/inotify https://crates.io/crates/kqueue are existing bindings.

bjorn3 commented 2 years ago

notify is a platform agnostic file watcher crate.

tkellogg commented 2 years ago

Oh wow the notify crate is perfect. Any idea how many files it can efficiently watch? I could see it getting up into the tens of thousands pretty easily.

bjorn3 commented 2 years ago

The default inotify limit (amount of watched directories I think) on linux seems to be 8192. This limit is shared among all program (including your editor). You can increase it by writing to /proc/sys/fs/inotify/max_user_watches as root, but each watched directory uses 1080 bytes of kernel memory, so there is a hard limit on how much directories you can watch depending on how much ram you have.

An alternative would be to implement a language server using the lsp protocol that only captures which files are changed according to the editor. This would also allow saving each individual keystroke even when the user doesn't explicitly save. In addition the editor extension could then be responsible for ensuring the daemon is running.

jauntywunderkind commented 2 years ago

Remarkably overkill solution, but for jauntywunderkind/git-auto-commit, I use facebook/watchman. It's extremely well tuned, and let's me add filters.

You can either just exec watchman command line (which runs either runs standalone or via by spawning a server- ideal if there's a lot of different watchers!), or you can talk to a server via it's socket interface (socket with json or bser encoding). There's also watchman_client to facilitate using that socket interface.

tkellogg commented 2 years ago

I'd rather not add external dependencies, if possible. I don't mind crates.

Another idea — the "tens of thousands" is naive. You could probably narrow it down to ~100 files with 95% confidence. Some ideas for heuristics:

neinseg commented 2 years ago

Another heuristic would be to inotify-watch files that are currently open (as determined through /proc/$pid/fd), as well as directories that are the working directory of a currently running process (as given by /proc/$pid/cwd IIRC). That, plus a regular (every few minutes) scan. That full scan could be done slowly in the background instead of in batches to avoid causing load spikes.

tkellogg commented 2 years ago

@neinseg I like that, but how long do files stay open? Does Vim or VSCode actually hold the file open? Seems like "opened files" is too ephemeral to work well, but I don't know. If you could watch all file descriptors under /proc/*/fd, then this would be an amazing solution. That or process an event log.

alin23 commented 2 years ago

I'm using a shell implementation of this feature using fswatch (cross-platform file monitor) and dura capture.

Note: this replaces the need for dura serve & as fswatch will be the daemon instead

Fish shell implementation

set repos (cat ~/.config/dura/config.json | jq -rc '.repos | keys | join("§")' 2>/dev/null)
set pollingSeconds 10

fswatch -e .git -0 -l $pollingSeconds -r (string split '§' -- $repos) | while read -l -z path
    cd $path 2>/dev/null || cd (dirname $path) && cd (git rev-parse --show-toplevel) && dura capture
end

Bash/Zsh shell implementation

repos=$(cat ~/.config/dura/config.json | jq -rc $'.repos | keys | map("\'\(.)\'") | join(" ")' 2>/dev/null)
pollingSeconds=10

eval "fswatch -e .git -0 -l $pollingSeconds -r $repos" | while read -r -d '' path
do
    cd $path 2>/dev/null || cd $(dirname $path) && cd $(git rev-parse --show-toplevel) && dura capture
done

How it works?

  1. Get the repos list from the dura config.json file (this means you can still dura watch repos as usual)
  2. Join the list of repo paths using a rarely used character §
  3. Watch for changes in all repos: fswatch -r
    1. -e .git: excluding changes to the .git folder
    2. -0: outputs changed paths delimited by the NUL character (or \0)
    3. -l $pollingSeconds: just like a debounce function, calls dura capture x seconds after the last event occured on a file to avoid too many commits when doing lots of consecutive changes
  4. cd into the changed repo and call dura capture
tkellogg commented 2 years ago

@alin23 can you send a PR to update the README? this is amazing and i don't want to lose it in the issues

tkellogg commented 2 years ago

thinking about this... @alin23 maybe we should start adding script files into the core repo for stuff like this.

alin23 commented 2 years ago

Yes, script files would be better. That way you could have a command like dura install --fish to copy the scripts and make them run at startup or something like that

tkellogg commented 2 years ago

I love it! Let's do it

alin23 commented 2 years ago

36