thecodrr / fdir

⚡ The fastest directory crawler & globbing library for NodeJS. Crawls 1m files in < 1s
https://thecodrr.github.io/fdir/
MIT License
1.51k stars 58 forks source link

v7.0 Plan #70

Open thecodrr opened 3 years ago

thecodrr commented 3 years ago

fdir has reached maximum performance. I am not saying that to discourage others from trying to make a faster directory crawler. Not at all. But I have stripped fdir down to its bones and haven't noticed any significant increase in speed. The only space left for improvement is directly in NodeJS internals. So where does that leave us?

I had initially planned to freeze the API of fdir but fdir has no proper "API". What I would be freezing, most probably, would be more features. I can't do that though. Not yet, anyway. Why?

  1. fdir is not feature complete. I don't want to make just the fastest directory crawler but also the best one.
  2. I don't want any one using fdir to ever feel like it can't do X or Z. An impossible feat but...
  3. I am not happy with the current state of the codebase. Even though an API freeze doesn't mean I can't change code, I want to be able to break things.

So what's the plan for v6.0?

Does this mean that the Builder API is dying? Nope. I like the Builder API and it'll stay. It will act as a kind of a proxy for the plugins underneath. That means the Builder API will also need to be extensible.

The end goal is to make the following features possible:

  1. Streams & Iterators
  2. Globbing (the current one is a joke, I am talking about a proper globber)
  3. Multiple filesystems (locking fdir to fs is a bad idea)
  4. Deno support, maybe (not sure about this)
  5. And other stuff...you guys can suggest.

In the end, the idea is to reduce complexity, increase flexibility, improve readability, and maintain the performance (and become the de-facto directory crawler in the Nodeverse).


In any case, that's the plan. I would love to hear what you guys think about this. Ideas, suggestions, possible ways to implement a plugin system, ideas for plugins, etc, etc, etc. is all welcome!

vjpr commented 3 years ago

Not a fan of the builder API, would prefer seeing options api become default. It's a much more common pattern across Node libs.

Multiple filesystems

Would be great to use with mem-fs. Some examples in the docs would be nice.

Globbing

I would like to be able to search for a dir, but also exclude everything under it. E.g. Find me all node_modules, but don't visit anything contained in node_modules. It's an edge case that I haven't found any globber able to handle.

Streams & Iterators

As I mentioned elsewhere, I'd like to be able to start processing first glob result as soon as its available, however continue walking in the background.

Idea: Watcher

Would be nice to be able to run a glob, and then set a flag to watch for changes. Currently, you have to use chokidar/watchman, and duplicate configuration. Many applications that need walking, also need watching. Could investigate if there is a way to plug fdir into chokidar, or build own.

Idea: Worker processing

Explore perf benefits of using workers for CPU ops. Can it speed up globbing? Maybe someone needs more complicated filtering logic. This starts to feel like it should be a stream operator separate to fdir though. But if we are using workers for globbing, maybe re-using them for processing is helpful.

glenn2223 commented 1 year ago

Hey, just a small idea.

Rather than completely suppressing errors, can they be returned in the result (maybe as an opt-in option)? In my case, it would be handy to display any errors that may have excluded files/folders.

I'm too happy to help in any way (when available 😉)

thecodrr commented 1 year ago

@glenn2223 The only problem is how the API would look. We already have a withErrors method.

P.S. Sorry for the delay in responding. I just now finished releasing v6.0 which completely rewrites fdir from scratch in TypeScript. Contributing now should be much easier.

glenn2223 commented 1 year ago

The only problem is how the API would look. We already have a withErrors method.

So does this still return just with errors in the object? Sorry reading the docs, I just assumed (which one should never do 😜) it threw

P.S. Sorry for the delay in responding. I just now finished releasing v6.0 which completely rewrites fdir from scratch in TypeScript. Contributing now should be much easier.

Not a problem. A nicety to have typescript and we're all busy with our normal (non open source) lives

thecodrr commented 1 year ago

So does this still return just with errors in the object? Sorry reading the docs, I just assumed (which one should never do stuck_out_tongue_winking_eye) it threw

Yes, it throws the errors instead of collecting them. The default behavior is suppressing & ignoring errors. It might be a good idea to collect the errors by default but it'll change the shape of the result so making it the default would break a lot of things.

If we make it opt-in, what should it be called when we already have withErrors. Or should we include an option in withErrors? Something along the lines of: withErrors({ collect: true }).

glenn2223 commented 1 year ago

Ahh, sorry I misunderstood. It wasn't a "no, we already have it"; more of just a how 🤦‍♂️


In my personal opinion, withErrors does read more as: include the errors in the returned object. However, as you said, this would be a breaking change. If this is the opted route, would it be that large of an issue in v7 - where it can be documented? For the original behaviour you could then go for withErrors({ throw: true }) (or throwInstead or something similar)


Just to throw a proverbial spanner in the works. Another option may be to utilise some logging functionality. You can then have a callback for the user to handle each error. This could also have the added benefit of terminating the running process.

Not only would this benefit the user it could also benefit you. By implementing some deep trace/debugging calls you can debug issues a bit easier. Doing this myself helped a lot with debugging issues, I simply ask users to send me a trace log and then I can walk through the logs/steps - and usually tell them it's not an issue with my extension 😜. You then have error, warning, info and trace logs which could be callbacks or included in the response.

It could then work as either of:

// # Option 1
.withLogging(options: {
    minimumLevel: LogLevel, // maybe to save the user doing throwaway 🤷‍♂️
    callback: (data: {
        level: LogLevel,
        message: string,
        error: Error // Possibly on or exceptions or multi-typed for custom errors
    }) => boolean | Promise<boolean> // return type is to support cancellation
}): Builder<TReturnType>;

// Option 2
.withLogging(options: {
    onError: (data: {
        message: string,
        error: Error // Possibly on or exceptions or multi-typed for custom errors
    }) => boolean | Promise<boolean>, // return type is to support cancellation
    onWarning: (data: {
        message: string,
        error?: Error // Possibly on or exceptions or multi-typed for custom errors
    }) => boolean | Promise<boolean>, // return type is to support cancellation
    onInformation: (data: {
        message: string,
    }) => boolean | Promise<boolean>, // return type is to support cancellation
    onTrace: (data: {
        message: string,
        error: Error // Possibly on or exceptions or multi-typed for custom errors
    }) => boolean | Promise<boolean>, // return type is to support cancellation
}): Builder<TReturnType>;

You could always add things like path/directory info, walker info, running time, etc. to the object too

43081j commented 6 months ago

this package was sent my way recently and, so far, i'm loving it. great work

a few immediate thoughts:

Another thing to mention is that stats are useful in many cases. Currently, fdir eats them up during crawling and the output is a straight list of paths. It would be useful to have the option to return the list of stats (which we must've done as part of the crawl presumably) rather than just the paths.

That would allow us to know which path is a directory, a file, etc.

Finally, if you need any help, shout up :D this package will be very useful as part of the ecosystem cleanup. so i'd be happy to help out wherever if i can.

mwargan commented 1 month ago

Agree very much on the stats parts as @43081j says!