Closed beiroot closed 1 year ago
We can exclude the threading problem - dvc push -j 1 works slow too.
AFAIK from the docs, the dvc push -j 1
is ignored for remotes, isn't it? So maybe this is the key to solve this mistery?
@SanderNugteren, @gcoter, I've seen you also had problems with SSH / SFTP remote issues? Have you guys managed to get pass them?
Two additional comments that might help:
MacOS --- VPN --- (over HTTP) --- ML server (local folder, shared directory)
and it worked like charm, but only for small files. Bigger files (like 20MB) get Timeout on reading data from socket
. I can see was an issue on github and it should be fixed for dvc-objects==0.1.7
, however, I have DVC dvc_objects = 0.7.0
and the problem occurs.
@michuhu One more thing, could you upgrade to the latest dvc version pip install -U "dvc[ssh]"
, check that dvc push
is still very slow and then run
dvc push --yappi --yappi-separate-threads
, which will produce a bunch of callgrind.dvc*
files in the current directory. Each file will be representing a separate thread. This will help us look deeper into what's going on in the transfer/status threads that we can see are taking a while from the cprofile results you've provided. You can analyze them yourself with kcachegrind/qcachegrind, but please also share them with us (feel free to send them privately, but they are pretty harmless).
Ok, so now it goes flawlessly. I think. I need more tests, but it generally works. And I'm pretty sure the problem is with lock on the files caused discrepancy between cache and remote. If I deleted the folder on the remote and pushed the same files again, I reproduced the problem. Could this be the only cause?! What is the philosophy behind cache and remote? So if I push the same files, to two different remotes just by changing the config file, am I being a bad boy? This is odd since I'm pretty sure the problem with ssh was caused even with a clear dvc install. But after all this messing around, I might be wrong... I'll check that too. Anyway, I'm sending the correct and wrong callgrind files.
@michuhu Thanks for the research!
Ok, so now it goes flawlessly. I think. I need more tests, but it generally works. And I'm pretty sure the problem is with lock on the files caused discrepancy between cache and remote. If I deleted the folder on the remote and pushed the same files again, I reproduced the problem. Could this be the only cause?!
So it stopped being dead slow? By deleting the folder, you mean the whole remote location or just subdirs or something?
What is the philosophy behind cache and remote? So if I push the same files, to two different remotes just by changing the config file, am I being a bad boy?
That's perfectly fine to do. Both cache and remote are object storages, with the only difference that cache
is assumed to be local to your workspace and is getting actively used during most operations (e.g. add/checkout/etc), while remote
is (roughly) assumed to only be used during push/pull/fetch/etc but with multiple users.
Ok, great news! this issue fixed the problem. Now, only the initial push via dvc[ssh] (the making actually making the repo folders) is slow. Everything else works fast.
Many thanks to @efiop, @pared and all DVC team!
@michuhu We made initial dir creation lazy too, new dvc versions (already released) should be faster for you.
Closing since this seems resolved.
Bug Report
I think it did through enough research to report this bug. I've first try to find info online and in support threads - no luck there. I've talked on Discord with Gao and he decided I should post this as a bug here.
Description
dvc push
anddvc add / import-url --to-remote
works really really slow. Like, few kilobytes in 5 minutes slow. SCP and SFTP to that same server work fast.Reproduce
or
The transferring process takes very very long. However, once in a (undefined) while it works fast.
Expected
Doing that same operation while remote is on local drive works blazingly fast.
Environment information
Topology of the network: MacOS --- VPN --- (over ssh) --- ML server (local folder, shared directory)
However, this process was reproduced in various environments:
And it always was very very slow.
Output of config files:
Output of
dvc doctor
:Additional Information (if any):
Here are the c-profile dumps from
dvc push
slow dvc push fast dvc pushIf you guys want, I can send the files, directly to the dev.
As you can see, there's clearly a performance issue. We can exclude the lock of file by some cloud storage like OneDrive, Dropbox etc. IDEs like Pycharm or backup apps like Time-Machine. The bug was reproduced outside of such folders.
We can exclude the threading problem -
dvc push -j 1
works slow too.