pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.33k stars 638 forks source link

Support nested .gitignore files in --pants-ignore-use-gitignore #5682

Closed stuhood closed 1 year ago

stuhood commented 6 years ago

The --pants-ignore option causes pants to ignore files and directories: it uses the same syntax as .gitignore because it is implemented using the ignore crate. The ignore crate also supports consuming .gitignore files from disk, but we're currently not using that support.

This ticket would deal with adding a pants level option next to --pants-ignore that would enable using the .gitignore file in addition to the explicit patterns in --pants-ignore. If the .gitignore file patterns were applied first, this would allow for maximum generality: with that option enabled, --pants-ignore would be a superset of .gitignore, but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

--pants-ignore is declared here: https://github.com/pantsbuild/pants/blob/2d4e61753d3896ee969ba239cb0a66c36d781f84/src/python/pants/option/global_options.py#L127-L131 and ends up being used here: https://github.com/pantsbuild/pants/blob/2d4e61753d3896ee969ba239cb0a66c36d781f84/src/rust/engine/fs/src/lib.rs#L388-L393

Eric-Arellano commented 4 years ago

but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

How would this work? Mind writing an example please?

stuhood commented 4 years ago

but because negation syntax is supported, you could negate patterns in --pants-ignore if you wanted them ignored by git, but not by pants.

How would this work? Mind writing an example please?

I think that I was thinking...?:

$ cat .gitignore
ignore/these/**/*.py
$ grep -A2 'pants_ignore' pants.ini
pants_ignore: [
    '!ignore/these/but/not/these/*.py',
  ]

Not a big deal though... not sure the usecase is that important.

benjyw commented 4 years ago

Bumping the priority on this. Filesystem specs make it more important than in the past. E.g., ./pants cloc '**' will now see a lot of files it wouldn't previously have, including in our own repo's case large rust build byproducts etc.

benjyw commented 4 years ago

But note that repos can have non top-level .gitignore (this repo does!) and those are evaluated on the relpath from the .gitignore file, so it's more complicated than just tacking .gitignore lines onto the pants ignore list.

For example, we ignore rust build cruft in src/rust/engine/.gitignore, and it's actually quite important to pants-ignore that cruft.

benjyw commented 4 years ago

Or maybe the ignore crate handles multiple nested .gitignore files naturally already?

gshuflin commented 4 years ago

If I'm reading this documentation correctly, it looks like the ignore crate supports parsing hierarchically-defined .gitignore files: https://docs.rs/ignore/0.4.11/ignore/struct.WalkBuilder.html#method.parents

stuhood commented 4 years ago

Just had a user run into a git-ignored directory that contained BUILD files causing them trouble. This would still be a really good thing to do.

gshuflin commented 4 years ago

https://github.com/pantsbuild/pants/pull/9310 gets us part of the way there. We still need to make pants pay attention to all nested .gitignore files, rather than just a top-level one.

thejcannon commented 2 years ago

One of my users hit this when I added a recursive glob on a data directory.

Mapping targets took several minutes and we couldn't figure out why. After it completed I asked what the line output of list was: 383k files 🙈

benjyw commented 2 years ago

I forget why this wasn't easily solved by the ignore crate, but I vaguely remember some difficulty. @gshuflin do you happen to recall?

gshuflin commented 2 years ago

It's definitely been a while since I've thought about this - IIRC, the problem was something like, it was difficult to make the code from the rust ignore crate that handled recursive .gitignore files in a filesystem (code that might've lived somewhere around here: https://github.com/pantsbuild/pants/pull/9310/files#diff-a1af714f77a06a94025988e047198428068ac449953a5c0297977b99567e2d63 ) interact with the virtual filesystem abstraction. It expected being able to traverse a real filesystem, and so we would've had to reimplement the logic that ignore had internally in the context of the pants fs module. And we judged this wasn't worth doing at the time since we were able to modify our top-level .gitignore to ignore the nested files we cared about ignoring.

thejcannon commented 2 years ago

It's not prohibitively hard to glob for the gitignore files from the virtual FS and then construct the ignores by hand, the only blocker is globbing is async and the scheduler is sync. I don't know Rust well enough to bridge that gap

thejcannon commented 2 years ago

The alternative is looking the gitignores up dynamically as we try and glob paths, and caching the results.

stuhood commented 2 years ago

There are some comments on the implementing ticket: https://github.com/pantsbuild/pants/pull/9310#pullrequestreview-377237911

The idea of holding onto ignore state as we descend the tree might be related to some of the optimizations mentioned in #14890, but I haven't thought about this particular problem recently.

jriddy commented 2 years ago

I found this while looking for a feature request for supporting recursive gitignore files. I'm running into this problem too now. So... +1 for doing this :sweat_smile:

fathom-parth commented 1 year ago

another +1...

Eric-Arellano commented 1 year ago

See https://github.com/pantsbuild/pants/pull/18654 :) Thanks @cognifloyd for talking through the strategy and haha thanks ChatGPT4 for helping to fill out the implementation

stuhood commented 1 year ago

This issue has a lot of history on it, which masks the remaining work to be done. I'm going to close it in favor of #19860, which has a more focused description, and some suggested implementation.