sudsy / node-find-files

A Node Module for finding files by attributes. Originally developed to find files modified since a particular date. Uses an eventemitter model so the results can start streaming back as soon as they are found.
15 stars 7 forks source link

Add max parallel operations configuration #7

Open nbarrera opened 4 years ago

nbarrera commented 4 years ago

I 've experienced performance issues on a server where I used this library over a big directory tree with lots of files.

I 've read your code and saw you are using asyn each.

Maybe adding a config parameter and using eachLimit instead of each so that files would be processed in chunks of parallel tasks, instead of in parallel as a whole.

it may fallback to no limiting if not configured

what do you think?

sudsy commented 4 years ago

Sounds sensible. Was the cpu overloaded because the file system was coming back too quickly?

sudsy commented 4 years ago

Actually, after looking at the code I've realised that it is recursive and so a limit wouldn't work that well. As it descends into directories it fires off a new async each for each sub directory.

What happens when you do nothing but log the found files to the console. Do you still experience performance issues?

nbarrera commented 4 years ago

I was using it at an aws ec2 instance... with a storage with a 200 I/O operations per second limit so I suspect that it was monopolizing the i/o operations at the server, then the server became unresponsive (couldn't even ssh into it)...

sometimes it would come back after a reasonable time, sometimes I couldn't wait and preferred to reboot the server.

sudsy commented 4 years ago

I tried changing the async.each to an async.eachLimit on my local machine with a concurrency limit of 1 and it doesn't make any difference to how quickly it completes. I would suggest cloning the repo and making that change yourself to see if it fixes your problem.

It is still highly concurrent because there are nested async each statements so it may not improve things.

I tried a refactor using a queue to limit concurrency properly but on my machine this slows down the search by a factor of 10 even with a high concurrency limit. https://github.com/sudsy/node-find-files/tree/limit_concurrency . I wouldn't want to implement something that reduced the performance for the general use case.

nbarrera commented 4 years ago

That's something that sounds like would work to me... doing same stuff at a lower pace.

I understand that you would also like to have the full throttle pace available as an option to you.

I 'm not able to test your code modification right now.., I think I will need some spare time for setting up a test that does not involve messing with production environment for me.

I will report back then.

I 've also peeking the async api and I like the queue.., I just wished a high number configured there would be the same as no queue