Closed stuhood closed 1 year ago
but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.
How would this work? Mind writing an example please?
but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.
How would this work? Mind writing an example please?
I think that I was thinking...?:
$ cat .gitignore
ignore/these/**/*.py
$ grep -A2 'pants_ignore' pants.ini
pants_ignore: [
'!ignore/these/but/not/these/*.py',
]
Not a big deal though... not sure the usecase is that important.
Bumping the priority on this. Filesystem specs make it more important than in the past. E.g., ./pants cloc '**'
will now see a lot of files it wouldn't previously have, including in our own repo's case large rust build byproducts etc.
But note that repos can have non top-level .gitignore (this repo does!) and those are evaluated on the relpath from the .gitignore file, so it's more complicated than just tacking .gitignore lines onto the pants ignore list.
For example, we ignore rust build cruft in src/rust/engine/.gitignore
, and it's actually quite important to pants-ignore that cruft.
Or maybe the ignore crate handles multiple nested .gitignore files naturally already?
If I'm reading this documentation correctly, it looks like the ignore crate supports parsing hierarchically-defined .gitignore
files: https://docs.rs/ignore/0.4.11/ignore/struct.WalkBuilder.html#method.parents
Just had a user run into a git-ignored directory that contained BUILD files causing them trouble. This would still be a really good thing to do.
https://github.com/pantsbuild/pants/pull/9310 gets us part of the way there. We still need to make pants pay attention to all nested .gitignore files, rather than just a top-level one.
One of my users hit this when I added a recursive glob on a data directory.
Mapping targets took several minutes and we couldn't figure out why. After it completed I asked what the line output of list
was: 383k files 🙈
I forget why this wasn't easily solved by the ignore crate, but I vaguely remember some difficulty. @gshuflin do you happen to recall?
It's definitely been a while since I've thought about this - IIRC, the problem was something like, it was difficult to make the code from the rust ignore
crate that handled recursive .gitignore
files in a filesystem (code that might've lived somewhere around here: https://github.com/pantsbuild/pants/pull/9310/files#diff-a1af714f77a06a94025988e047198428068ac449953a5c0297977b99567e2d63 ) interact with the virtual filesystem abstraction. It expected being able to traverse a real filesystem, and so we would've had to reimplement the logic that ignore
had internally in the context of the pants fs
module. And we judged this wasn't worth doing at the time since we were able to modify our top-level .gitignore
to ignore the nested files we cared about ignoring.
It's not prohibitively hard to glob for the gitignore files from the virtual FS and then construct the ignores by hand, the only blocker is globbing is async and the scheduler is sync. I don't know Rust well enough to bridge that gap
The alternative is looking the gitignores up dynamically as we try and glob paths, and caching the results.
There are some comments on the implementing ticket: https://github.com/pantsbuild/pants/pull/9310#pullrequestreview-377237911
The idea of holding onto ignore state as we descend the tree might be related to some of the optimizations mentioned in #14890, but I haven't thought about this particular problem recently.
I found this while looking for a feature request for supporting recursive gitignore files. I'm running into this problem too now. So... +1 for doing this :sweat_smile:
another +1...
See https://github.com/pantsbuild/pants/pull/18654 :) Thanks @cognifloyd for talking through the strategy and haha thanks ChatGPT4 for helping to fill out the implementation
This issue has a lot of history on it, which masks the remaining work to be done. I'm going to close it in favor of #19860, which has a more focused description, and some suggested implementation.
The
--pants-ignore
option causes pants to ignore files and directories: it uses the same syntax as.gitignore
because it is implemented using the ignore crate. The ignore crate also supports consuming.gitignore
files from disk, but we're currently not using that support.This ticket would deal with adding a pants level option next to
--pants-ignore
that would enable using the.gitignore
file in addition to the explicit patterns in--pants-ignore
. If the.gitignore
file patterns were applied first, this would allow for maximum generality: with that option enabled,--pants-ignore
would be a superset of.gitignore
, but because negation syntax is supported, you could negate patterns in--pants-ignore
if you wanted them ignored by git, but not by pants.--pants-ignore
is declared here: https://github.com/pantsbuild/pants/blob/2d4e61753d3896ee969ba239cb0a66c36d781f84/src/python/pants/option/global_options.py#L127-L131 and ends up being used here: https://github.com/pantsbuild/pants/blob/2d4e61753d3896ee969ba239cb0a66c36d781f84/src/rust/engine/fs/src/lib.rs#L388-L393