seeraven / gitcache

Local cache for git repositories to speed up working with large repositories and multiple clones.
BSD 3-Clause "New" or "Revised" License
31 stars 6 forks source link

Cannot "Delete directory" on Windows after 5 minutes (outputtimeout) #36

Open nhanvolac opened 2 months ago

nhanvolac commented 2 months ago

Hello,

I am encountering an issue when trying to retry the "gitcache git clone mirror -b branch destination_folder" command after a 5 minutes timeout (outputtimeout = 5 mininute at config gitcache).

The log displays the useful information (GITCACHE_LOGLEVEL=Debug): when I using the Jenkin job:

10:11:38 Execute command '['C:\\Git\\git.exe', '-c', 'lfs.url=ssh://git@mydomain:22/repo/repo1.git/info/lfs', '-c', 'lfs.storage=D:\\Je\\gc\\mirrors\\mydomain_22\\repo\\repo1\\lfs', 'clone', 'D:\\Je\\gc\\mirrors\\mydomain_22\\repo\\repo1\\git', '-b', 'my_branch', 'D:\\Je\\wp\\job1_1\\test']' (shell=False, cwd=None) with command timeout of 3600 seconds and output timeout of 300 seconds. 10:11:38 Cloning into 'D:\Je\wp\job1_1\test'... 10:16:39 No stdout/stderr output received within 300 seconds! 10:16:40 Command '['C:\\Git\\git.exe', '-c', 'lfs.url=ssh://git@mydomain:22/repo/repo1.git/info/lfs', '-c', 'lfs.storage=D:\\Je\\gc\\mirrors\\mydomain_22\\repo\\repo1\\lfs', 'clone', 'D:\\Je\\gc\\mirrors\\mydomain_22\\repo\\repo1\\git', '-b', 'my_branch', 'D:\\Je\\wp\\job1_1\\test']' finished with return code -2000. 10:16:40 Delete directory D:\Je\wp\job1_1\test. 10:19:17 Command '['C:\\Git\\git.exe', '-c', 'lfs.url=ssh://git@mydomain:22/repo/repo1.git/info/lfs', '-c', 'lfs.storage=D:\\Je\\gc\\mirrors\\mydomain_22\\repo\\repo1\\lfs', 'clone', 'D:\\Je\\gc\\mirrors\\mydomain_22\\repo\\repo1\\git', '-b', 'my_branch', 'D:\\Je\\wp\\job1_1\\test']' failed with return code -2000. Starting retry 1 of 3. 10:19:17 Execute command '['C:\\Git\\git.exe', '-c', 'lfs.url=ssh://git@mydomain:22/repo/repo1.git/info/lfs', '-c', 'lfs.storage=D:\\Je\\gc\\mirrors\\mydomain_22\\repo\\repo1\\lfs', 'clone', 'D:\\Je\\gc\\mirrors\\mydomain_22\\repo\\repo1\\git', '-b', 'my_branch', 'D:\\Je\\wp\\job1_1\\test']' (shell=False, cwd=None) with command timeout of 3600 seconds and output timeout of 300 seconds. 10:19:17 fatal: destination path 'D:\Je\wp\job1_1\test' already exists and is not an empty directory.

When checking the folder D:\Je\wp\job1_1\test. I see that the presence of the ".git" subfolder prevents cloning again.

and no errors appear in the folder deletion job (I found it here, but don't know if it's right)

https://github.com/seeraven/gitcache/blob/ec99dcf7c175279927aea198753ff7028071f95f/src/git_cache/command_execution.py#L117

The process tries 3 times before receiving an error result with return code 128.

seeraven commented 2 months ago

Hi, the directory should be deleted after a failed attempt to clone using the shutil.rmtree() function, but it is instrumented to ignore errors. So I guess there is actually an error deleting the folder. If I recall correctly, deleting a folder on Windows is - for whatever reason - subspectible to "Permission" errors. That was the reason I had already to implement a workaround in the GitMirror class.

However, I don't think this is the initial reason the command failed. A clone from a local directory shouldn't timeout. Do you have any means to try the local clone "manually" (with no existing target directory) and paste the output? So the command according to your log would be:

C:\Git\git.exe -c lfs.url=ssh://git@mydomain:22/repo/repo1.git/info/lfs -c lfs.storage=D:\Je\gc\mirrors\mydomain_22\repo\repo1\lfs clone D:\Je\gc\mirrors\mydomain_22\repo\repo1\git -b my_branch D:\Je\wp\job1_1\test

Of course, the destination directory should be empty/non-existant before the call. ;-)

nhanvolac commented 2 months ago

Thanks @seeraven for your information.

A clone from a local directory shouldn't timeout --> I tried to run multiple Jenkins job to check out source code on a single machine.

I also think that the folder deletion job failed, but no error details are displayed -> difficult to find the root cause (One more than, the account when using Jenkin job is different from the account to run the command as you recommend)

That was the reason I had already to implement a workaround in the GitMirror class --> Can you explain this part clearly?

Thank you.

seeraven commented 2 months ago

I also think that the folder deletion job failed, but no error details are displayed -> difficult to find the root cause (One more than, the account when using Jenkin job is different from the account to run the command as you recommend)

But each user/account uses its own GITCACHE_DIR tree, right? Otherwise it could easily be that one account isn't allowed to create/delete directories created by the other one.

That was the reason I had already to implement a workaround in the GitMirror class --> Can you explain this part clearly?

In the GitMirror class there is a method _rmtree (https://github.com/seeraven/gitcache/blob/ba23b9e46042fc236bc879bdb357e7c7630dc7e0/src/git_cache/git_mirror.py#L219) that checks for permission errors while deleting a directory tree. If a permission error occurs, it tries to change the permissions on the file and deletes it. This seems to be needed on some platforms where the simple call of shutils.rmtree is not capable of deleting a directory itself.

But I am still wondering why the git call times out. If I am trying to clone a repository with git into a directory with an existing .git directory, git immediately prints the error message and quits. Perhaps the existing directory is not the real culprit here, but something like git waiting to enter a password or something. The only thing git is saying is the line Cloning into 'D:\Je\wp\job1_1\test'..., then it stays there for 5 minutes until the timeout kicks in - no error output or anything. Or did you shorten the log you've pasted?

seeraven commented 1 month ago

Hi again, I've just created the new release v1.0.18 that uses the version of rmtree with permission error handling and added more debug output to it. Can you check whether this version solves the problem? If not the debug output might help further to narrow down the problem.