Open sindresorhus opened 6 years ago
FWIW, the biggest performance bottleneck in my testing on eleventy (for whatever reason) was the gitignore: true
option. Removing this and just looping over .gitignore
myself and adding the entries as !
lines was much faster, for whatever reason.
(Love globby—thanks!)
About filtering out .gitignore
patterns before globbing, is translating gitignore rules and adding them to fast-glob ignore the right direction to go?
I'm adding this here as it's somewhat related: right now just requiring globby
on my system often takes anywhere between 100ms and 250ms, and just installing it on a new directory downloads more than 4MB of stuff.
I've switched to tiny-glob in a project of mine, which takes less than 10ms to require on my system and install less than 100kb of stuff.
Perhaps some heavy dependencies could be removed/replaced.
About filtering out
.gitignore
patterns before globbing, is translating gitignore rules and adding them to fast-glob ignore the right direction to go?
I think so, yes.
I'm adding this here as it's somewhat related: right now just requiring
globby
on my system often takes anywhere between 100ms and 250ms, and just installing it on a new directory downloads more than 4MB of stuff.
Yeah, that is quite unfortunate. It's because of micromatch
, a dependency of fast-glob
. See: https://github.com/mrmlnc/fast-glob/issues/154
// @mrmlnc
I concur with @zachleat that the gitignore: true
has a huge perf cost for whatever reason.
On our codebase:
using gitignore: true
:
~8sec
await globby([ `**` ], { gitignore: true });
custom:
~100ms
const ignored = fs.readFileSync(".gitignore", { encoding: "utf8" })
.split("\n")
.filter( line => !/^\s*$/.test(line) && !/^\s*#/.test(line))
.map( line => "!" + line.trim().replace(/^\/+|\/+$/g, ""));
const files = await globby([
`**`,
`!.git`,
...ignored
]);
~8000% difference is a lot...
rollup is said to be good at removing dead code, I don't know if it goes as far as to repackage packages.
Just ran into this as well in our codebase. Commenting out gitignore leads to an improvement from 500 sec to 2 sec.
For folks complaining about the performance hit of the Options.gitignore
, my understanding of the current implementation is enabling that option adds a blocking preprocessing step that enumerates and parses potentially every .gitignore
file in the directory tree regardless of the specified patterns
value. It looks like Options.ignore
is respected, but for many (perhaps most?) inputs to globby
, the scope of this additional work may be vastly larger than what's done when gitignore
is not specified.
Some of this additional work is unnecessary in some conditions, but I don't know offhand how difficult it is to determine that at runtime given the recursive nature of .gitignore
and patterns
.
Reference:
On our codebase:
- using
gitignore: true
: ~8sec- custom: ~100ms ~8000% difference is a lot...
FYI, this is not an apples-to-apples comparison as the gitignore
option for globby
honors every .gitignore
file in the tree while the custom code honors only the one at the root.
For scenarios where only the root .gitignore
file matters, setting Options.ignoreFiles
to ".gitignore"
should be very fast as it should avoid the (correct by specification!) **
part of the default behavior.
By coincidence I have recently written a little benchmark comparing globby against tiny-readdir-glob-gitignore
, here's the result:
Basically as mentioned in the original post indeed skipping ignored directories as early as possible is significant. Additionally a much faster implementation than ignore
is possible, and handling all .gitignore
-like files found shouldn't be a problem. All in all when optimized enabling handling of .gitignore
-like files should speed things up rather than slow things down in many real world scenarios, I think.
By coincidence I have recently written …
@fabiospampinato Nice! I was just in the process of writing my own ideal glob function—based on fs.readdir
and picomatch
—before coming across tiny-readdir-glob
[-gitignore
]… So maybe now I don’t have to :)
But just have a couple of questions:
import {readdir} from 'node:fs/promises'
?@danielbayley could you open an issue about that in the other repo? That seems unrelated to globby
.
@danielbayley could you open an issue about that in the other repo? That seems unrelated to
globby
.
@fabiospampinato Fair point. Sure, see https://github.com/fabiospampinato/tiny-readdir/issues/2 and https://github.com/fabiospampinato/tiny-readdir-glob/issues/2.
I our case enabling gitignore
increase processing time from <1min to ~40min 😱 (because of a lot of php/npm dependencies which added to .gitignore
)
This issue is dedicated to gathering ideas on how we can improve performance.
Some things that might help is to do filtering in the correct order to ensure minimal work is done, and cache and memoize as much as possible. The most important thing is that we skip ignored directories like
node_modules
as early as possible. It's the source of a lot of performance problems. The hard drive is the bottleneck, so we should ensure we interact with it as little as possible.Some more ideas in #44 and mrmlnc/fast-glob#15
This Node.js feature request would also help a lot: https://github.com/nodejs/node/issues/15699