Add timeout during long filesystem traversal

sixcircuit commented 10 months ago

Thanks for the amazing plugin. I've used it for years. I'm on the new lua version. It's great.

Occasionally I mess up and :CommandT on a massive file tree. Neovim locks up and I can't cancel the traversal. I have to kill nvim. Is there a way to setup a timeout, or a way to break the process like ctrl+c?

I can implement it if you can point me in the right direction.

Thanks again.

Kendrick

wincent commented 10 months ago

Before we get too deep into the details, what scanner are you using?

Off the top of my head, I don't know of a good way to do this. Could definitely add the equivalent of the old g:CommandTMaxFiles setting, but depending on the scanner, that may or may not be all that useful. For the built-in scanner, we can obviously just bail out immediately upon hitting the limit, but for any scanner that forks out to an external process, the buffering behaviour of that process may influence our ability to quickly cut-off a "run-away" scan.

In an ideal world, I'd love to be able to catch CTRL-C (interrupt) but I gather Neovim is grabbing that before our code gets a chance to see it. But it might be possible.

Another option still, albeit a messy one, would be to do the scanning in a separate thread, to give us more levers for timing out, interrupting etc.

sixcircuit commented 10 months ago

Yeah, it might be user error on my part. I didn't know there was more than one scanner. I'm using whatever the default is. I know the lua stuff is a work in progress. I didn't see it in the lua docs. Are the ruby docs accurate for the scanner stuff? If so, I can update the lua docs.

If we did want to improve it, a thread sounds like maybe the best way to do it from a structural standpoint. It would give us one way to manage all the scanning processes in a unified way. We can also have some sort of reporting mechanism for how many files have been scanned so we could do a little progress update thing in the buffer. But maybe all this is overkill if fixing the scanner is all I need.

Thanks again for the help.

wincent commented 10 months ago

it might be user error on my part

Nah, I don't think so. Even if you cd to the root directory of your file system (/), it's reasonable to expect that the scanner shouldn't lock you out of using your editor until it's scanned the whole filesystem.

I didn't know there was more than one scanner. I'm using whatever the default is.

Yeah, so the default is C code that uses the standard library man 3 fts to traverse. The other scanners generally fork another process (like git or rg etc) and write their output into a slab of memory. So the default scanner is definitely easier for us to control the behavior of, because we are in charge of every aspect of the traversal.

Are the ruby docs accurate for the scanner stuff?

Not really. The Lua rewrite is from-scratch, and I decided to not port everything over exactly as was but rather take the opportunity to reevaluate every setting and figure out whether it is worth carrying forward, or whether it should be changed. So, there are fewer knobs to tweak, and some missing features, at least for now.

a thread sounds like maybe the best way to do it from a structural standpoint

Yeah, that sounds suitably high level, I guess. Like I said, for the built-in default, we can do literally anything we want. But for the others, sticking them in a thread and doing a pthread_cancel() on them if they run for too long might be a good idea. I agree with you that fixing the "locking up the editor" problem is more urgent than "creating a nice abstraction for reporting progress", but the latter would be nice to have eventually.

sixcircuit commented 10 months ago

tldr: is there an easy way to debug the internals of what the "find" finder is doing? I have a bug that i want to investigate.

I started experimenting with different finders. I think the "find" finder is the best for me. I wanted to configure some exclusions (but not exactly .gitignore) and find was the way to do it.

there's also a way to use "find" so it returns a maximum fixed number of files and then quits -- so there's a built in file traversal limiter. the trick is to use "head". apparently head sends SIG_PIPE to the program piped to it after "n" lines and when "find" gets that signal, it quits.

This all works perfectly except for some reason it hangs forever if I run CommandTFind in my home directory. I can't figure out how to diagnose the hang. I can run the "find" command my function produces on the command line without issue. I can also run CommandTFind on "/" and other very large directories without any issues.

config file below. I'm on macos 13.6 (hence the ghead (gnu head from macports) hack).

thanks for any thoughts.

commandt = require('wincent.commandt')

find_ignore_dir = {".git", "node_modules"}
find_ignore_file = {"*.o", "*.obj"}
-- find_max_files = 20000
find_max_files = 2

head = "head"

if jit.os == "OSX" then
   head = "ghead"
end

commandt.setup({
   ignore_case = true,
   finders = {
      find = {
         command = function(directory)
            if vim.startswith(directory, './') then
               directory = directory:sub(3, -1)
            end
            if directory ~= '' and directory ~= '.' then
               directory = vim.fn.shellescape(directory)
            end
            local drop = 0
            if directory == '' or directory == '.' then
               directory = '.'
               drop = 2
            end

            local command = 'find -L ' .. directory

            for _, dir_name in ipairs(find_ignore_dir) do
               command = command .. ' -not \\( -path "*/' .. dir_name .. '/*" -prune \\)'
            end

            for _, file_name in ipairs(find_ignore_file) do
               command = command .. ' -not \\( -path "*/' .. file_name .. '" -prune \\)'
            end

            command = command .. ' -type f -print0 2> /dev/null' 

            if find_max_files > 0 then
               command = command .. " | " .. head .. " -z -n " .. find_max_files
               command = command .. ' 2> /dev/null'
            end

            return command, drop
         end,
         fallback = true,
      },
   }
})

wincent commented 10 months ago

Yeah, that's a bit puzzling. I tried out your config, which makes sense, and saw it work in a small directory but hang in a big one.

My first thought was that it didn't like the pipe, so I moved the logic into a script in my $PATH and invoked that instead (ie. to hide the pipe in a sub-process), but that made no difference.

To your question about debugging, there is some logging (see the calls to DEBUG_LOG in the source) if you build with make DEBUG=1. That will dump into commandt-debug.log.

In any case, I think the best solution here might be to implement the MAX_FILES handling which currently just looks like this 😬:

https://github.com/wincent/command-t/blob/8e5e827754b255452b21b8acf31afedb3e123e1c/lua/wincent/commandt/lib/scanner.c#L90-L92

I don't have a lot of bandwidth right now, but I will take a look when I can, if somebody doesn't beat me to it. I think the solution is going to be to continue to allocate a large slab, but to return early once we hit the limit. I am not sure if we'll have to do something special to give the child process a chance to gracefully exit.

wincent commented 10 months ago

I don't have a lot of bandwidth right now, but I will take a look when I can

Bah, I couldn't resist. I have a messy patch locally (not polished enough to push yet) that implements proper MAX_FILES logic.

In doing so, I discovered an interesting bug which I think may explain what you were seeing, at least in part: if you test this in your home directory, where there are a lot of dotfiles, it is easy to hit the file limit without ever seeing any files that aren't in dot directories; now, depending on your settings, these entries may all get hidden, so Command-T thinks there were zero results, and it runs the (slow) fallback scanner over the entire home directory. I am testing this right now, and I can see it using the fallback scanner to scan all 3.2 million files in my home directory, which makes it apparently hang (but it's not hung — it's doing work still).

Call graph showing fallback scanner working

``` Call graph: 2577 Thread_8993745 DispatchQueue_1: com.apple.main-thread (serial) 2577 ??? (in libluajit-5.1.2.1.0.dylib) load address 0x100dd5000 + 0x54a3 [0x100dda4a3] 2577 commandt_file_scanner (in commandt.so) + 21 [0x100f9c215] find.c:96 2574 commandt_find (in commandt.so) + 285 [0x100f9bffd] find.c:57 + 2567 fts_read$INODE64 (in libsystem_c.dylib) + 889 [0x7ff80eebe906] + ! 2257 fts_build (in libsystem_c.dylib) + 1033 [0x7ff80eec3fe1] + ! : 2257 advance_directory (in libsystem_c.dylib) + 51 [0x7ff80eec48be] + ! : 2257 getattrlistbulk (in libsystem_kernel.dylib) + 10 [0x7ff80efa63aa] + ! 306 fts_build (in libsystem_c.dylib) + 252 [0x7ff80eec3cd4] + ! : 306 advance_directory (in libsystem_c.dylib) + 51 [0x7ff80eec48be] + ! : 306 getattrlistbulk (in libsystem_kernel.dylib) + 10 [0x7ff80efa63aa] + ! 2 fts_build (in libsystem_c.dylib) + 173 [0x7ff80eec3c85] + ! : 2 open$NOCANCEL (in libsystem_kernel.dylib) + 204 [0x7ff80efb033d] + ! : 2 __open_nocancel (in libsystem_kernel.dylib) + 10 [0x7ff80efa43c2] + ! 2 fts_build (in libsystem_c.dylib) + 963 [0x7ff80eec3f9b] + ! 1 fts_alloc (in libsystem_c.dylib) + 56 [0x7ff80eebe334] + ! | 1 _malloc_zone_calloc (in libsystem_malloc.dylib) + 60 [0x7ff80ee17b75] + ! | 1 default_zone_calloc (in libsystem_malloc.dylib) + 19 [0x7ff80edfd606] + ! 1 fts_alloc (in libsystem_c.dylib) + 124 [0x7ff80eebe378] + ! 1 _platform_memmove$VARIANT$Haswell (in libsystem_platform.dylib) + 71 [0x7ff80eff31a7] + 6 fts_read$INODE64 (in libsystem_c.dylib) + 400 [0x7ff80eebe71d] + ! 4 nanov2_free_to_block (in libsystem_malloc.dylib) + 360 [0x7ff80edfc6b3] + ! : 4 nanov2_madvise_block (in libsystem_malloc.dylib) + 141 [0x7ff80ee057d5] + ! : 4 mvm_madvise_free (in libsystem_malloc.dylib) + 87 [0x7ff80ee058c7] + ! : 4 madvise (in libsystem_kernel.dylib) + 10 [0x7ff80efa5baa] + ! 2 nanov2_free_to_block (in libsystem_malloc.dylib) + 239,273 [0x7ff80edfc63a,0x7ff80edfc65c] + 1 fts_read$INODE64 (in libsystem_c.dylib) + 406 [0x7ff80eebe723] 1 commandt_find (in commandt.so) + 299 [0x100f9c00b] find.c:58 1 commandt_find (in commandt.so) + 459 [0x100f9c0ab] find.c:67 + 1 _platform_memmove$VARIANT$Haswell (in libsystem_platform.dylib) + 268 [0x7ff80eff326c] 1 commandt_find (in commandt.so) + 480 [0x100f9c0c0] find.c:68 1 commandt_str_init (in commandt.so) + 100 [0x100f9eb24] str.c:47 ```

So, I think we can fix that bug (the inappropriate fallback) and also look at what is effectively a perf bug (because it shouldn't take that long to scan 3.2 million files — I'm even wondering if I might have a cyclic simlink somewhere[^cyclic]). And soon (within a day or two) I should be able to polish up the max file patch in any case.

[^cyclic]: Yep, I had several... gfind . -follow -printf "" finds a bunch... (gfind: File system loop detected).

wincent commented 10 months ago

A couple of commits parked on the next branch showing what I am thinking of doing:

f883d0060e66f49dd472212e27c5da20ae96c0b0
660d2417ad86860830cbfb2b64d3a8830a7ebceb

Lightly tested!

sixcircuit commented 9 months ago

Thank so much for fixing the fallback performance issue! That was enough to solve my problem well enough. I don't get any notice if it hits the 20k limit I imposed with find and head but I'm almost always running nvim in the wrong directory if I hit the limit anyway. I just means I don't have to kill it.

edit: I checked out the new docs and the max_files option. it works great. I can drop the head hack. thanks again!

wincent commented 9 months ago

I don't get any notice if it hits the 20k limit

Definitely worth adding something like that in the future though.

wincent / command-t

Add timeout during long filesystem traversal #420