lycheeverse / lychee

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!
https://lychee.cli.rs
Apache License 2.0
2.05k stars 121 forks source link

Read .gitignore #1331

Closed Ekleog-NEAR closed 2 weeks ago

Ekleog-NEAR commented 9 months ago

Currently, lychee has its own .lycheeignore format.

It’d be very nice if it could read the .gitignore, in order to avoid having to duplicate all information there.

See also the discussion we had about our lychee usage at https://github.com/near/nearcore :)

mre commented 9 months ago

Currently, we use jwalk for file traversal, which has limited support for .gitignore files. @Byron, is that correct? Should I create an issue to add .gitignore support to jwalk? Alternatively, how hard would it be to add .gitignore support to lychee with jwalk?

There's also walkdir, which has .gitignore support.

Byron commented 9 months ago

There's also walkdir, which has .gitignore support.

Can it be that you mean the ignore crate? I just checked and walkdir definitely doesn't have .gitignore support or hides it from prying eyes.

If I were you, I'd probably try to use the ignore crate on the largest input I have and see if the performance is acceptable.

Should I create an issue to add .gitignore support to jwalk? Alternatively, how hard would it be to add .gitignore support to lychee with jwalk?

You can create an issue but I wouldn't implement it for the sole reason that it wouldn't be a quick & fun thing for me to do - I didn't write jwalk and I am glad it's still working 😅. Thus it might be easier to use gix for that, probably after checking if a directory contains a repository, this code does exactly that - you'd definitely want to avoid to call gix::open(…) on any directory you encounter. Edit: I just realize that lichee probably works from a repository root, so opening that repository isn't really a problem 😁.

Once a repo is available and paths within the repo are traversed, you should be able to prime the AttributeStack to the path at hand and then check if it's excluded().

I hope that helps.