theophilusx / ssh2-sftp-client

a client for SSH2 SFTP
Apache License 2.0
819 stars 200 forks source link

Problem with downloadDir() method with multiple files #481

Closed mcorrado-mcit closed 10 months ago

mcorrado-mcit commented 1 year ago

I use node 16 version and the current 9.1.0 version of the library, trying to download a remote folder containing about 1300 small files i get a "get Failed: ..." type error, the strange thing is that if i try to reduce the remote folder to about 500 files, the process is successful, I have the impression that the error is thrown when there are many files in the remote folder. i ran the test on several systems centos and ubuntu

theophilusx commented 1 year ago

Yes, I suspect the problem is due to the number of files.

I was worried this would occur. After some pressure from users to make the downloads faster by enabling them to be done as parallel promise calls (using Promise.all) I modified the code so that it will create promises for each file to be downloaded within a directory. While this does make things a lot faster, it does have the problem that for large directories with many files, node will run into resource limits and you will get errors (what errors willl depend at what point node hits resource limits). I'm a little surprised it has taken so long before someone encountered this problem. I do need to document this limitation at some point, but right now, I don't have time to spend on this library.

I have thought about adding some sort of limiter, but the problem with that is deciding at what level to set it. I could perhaps add another option which would allow the user to set the maximum number of parallel promise to instantiate, but then both the API and the code become far more complicated, defeating the main purpose for this library i.e. keep it simple.

The best solution for your use case is to not use the downloadDir method and roll your own. This is trivial to do and you could even use the existing code as a guide. However, instead of looping over the list of files, creating a promise which you push into an array and then call Promise.all(), you would just do an await after each call (or you could do some grouping to do multiple get/fastGet promises which are within limits to make it a bit faster, but avoid exceeding maximum resource limits for async calls i.e. do groups of 200).

The code you would need for your own download dir functionality will likely be a lot less than what is in the library as you have more control over the environment and fewer unknowns which need to be catered for. THis can make things much simpler.

mcorrado-mcit commented 1 year ago

OK, perfect, thank you.

in any case I did a trivial test based on your suggestion, in practice I replaced the "promise.all" method with "promise.any" or "promise.allSettled", and the latter worked perfectly without generating any errors. The same goes for the "uploadDir" method. ... I don't know if this can help ...

Immagine

theophilusx commented 1 year ago

I'm not sure Promise.any() is the correct fix. Using Promise.any() will mean that the promise is fulfilled as soon as any one of the transfer promises is fulfilled. With Promise.all(), the main promise is not fulfilled until all of the child promises are fulfilled.

So, consider you have two files, a small and a very large file. You start two promises, one for each file being downloaded. As soon as the smaller file is transferred, the main promise will return with the value from that short transfer promise (either resolved or rejected), but the other promise is still running. However, yhour code won't know that and will assume evertything has finished and you can move on.

With Promise.all() the promise doesn't return until all the promises have been fulfilled and it returns an array with the fullfillment value from each promise (so you can check to see they all succeeded or if any failed). Problem is of course that if you have a lot of files, it will be a very large array and it will be holding lots of stuff in memory until the last promise is fulfilled.

mcorrado-mcit commented 1 year ago

I agree with you that "Promise.any()" works, while "Promise.allSettled" actually waits for all promises to execute anyway, in fact in my case the main promise is returned after all promises are resolved get of all the files, obviously it's not trusted, because it clearly returns resolved the first one which most likely succeeds, but in terms of physical files, they're all downloaded anyway.

In any case I will have to change strategy.

mcorrado-mcit commented 1 year ago

I found a brutal workaround, and ran a test, so it never throws errors. This could inspire a possible resolution.

img

gabrieleCutowl commented 1 year ago

I have the same behavior using get, exists and stat method inside a Promise.all() loop.

De-Santa commented 11 months ago

Same, max listeners error when using downloadDir on large directories

theophilusx commented 11 months ago

I have put a preliminary fix for this issue in the limit-promises branch. This fix introduces a new config object property 'maxPromises' which sets the maximum number of concurrent promises in uploadDir and downloadDir. By default, this is set to 4.

Note that I have not fully stress tested this fix yet. The existing tests all pass, but the existing tests also fa9iled to trigger the errors which occurred when there were a large number of files in a directory and too many concurrent promises were fired off at the same time. My next step is to create new tests which will trigger this issue when the maxPromises setting is too high. Note also that 4 as the default is arbitrary. I may chang this after more testing to a higher number.

theophilusx commented 10 months ago

Testing seems successful and fix now merged into master. Will likelyu push out v10 later this week after some documentation cleanup etc. Closing this issue now as resolved.

ajithk-one commented 6 months ago

Even I have the same behavior using get, exists and stat method inside a Promise.all() loop. i saw you have merged PR but in v10.0.3 i can't see anything related to setting event listeners. any possible solution to avoid adding event listeners ?

theophilusx commented 6 months ago

@ajithk-one I don't have a clear idea of what it is your trying to do

You should not be seeing same issue with normal use of get, exists and stat. This issue was specific to downloadDir/uploadDir due to how they implemented the get/put etc. If you are seeing this using just get/put, then you have an error in your script or you are using the API incorrectly.

If, for example, you are calling promise.all() with a large number of get(), put(), exists() or any other method, then you will need to make sure that limit does not exceed what node can handle i.e. look at the implementation of downloadDir or uploadDir to see how this can be handled.

You cannot just create a huge array of promises and pas them to promise.all() and expect it all to just work. You need to manage the number of concurrent promises so that it does not exceed the limit (again, see how downloadDir/uploadDir does this by breaking things up into groups of promises).