Closed leighmcculloch closed 2 years ago
Note that in all the cases this has happened to me if I delete the .git/unshallow.lock
file and manually run git fetch --unshallow
the command finishes very quickly. So this doesn't appear to be a case where the unshallow is taking too long.
If the git fetch --unshallow --tags
is running in the background of the container that sets up the volume, is it possible that the container is stopping or being shutdown without waiting for the fetch to complete?
We are running the command in the background, but not 'detached', so it would be terminated if you quickly reloaded or closed the window.
I'm not closing or reloading the window, the window is staying open.
The output you posted above is truncated, I guess Git crashed. The full output would be something like (omitting the list of tags at the end):
remote: Enumerating objects: 1130466, done.
remote: Counting objects: 100% (1130462/1130462), done.
remote: Compressing objects: 100% (282100/282100), done.
remote: Total 1126656 (delta 812958), reused 1118996 (delta 805378), pack-reused 0
Receiving objects: 100% (1126656/1126656), 360.93 MiB | 24.27 MiB/s, done.
Resolving deltas: 100% (812958/812958), completed with 2639 local objects.
From https://github.com/microsoft/vscode
[...]
Does this always happen? Do you have enough free disk space? (Check with df -H
inside the container.)
This is happening consistently. According to df -H
my container has plenty of space:
Filesystem Size Used Avail Use% Mounted on
overlay 63G 25G 35G 42% /
tmpfs 68M 0 68M 0% /dev
tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
shm 68M 0 68M 0% /dev/shm
/dev/vda1 63G 25G 35G 42% /workspaces
tmpfs 4.2G 0 4.2G 0% /proc/acpi
tmpfs 4.2G 0 4.2G 0% /sys/firmware
This replicates consistently if you clone this repositories master
branch:
https://github.com/leighmcculloch/stellar--stellar-core
Same thing happens if you clone the parent repository too: https://github.com/stellar/stellar-core
Sometimes the output is even briefer, where there is no output after the git fetch call.
[6859 ms] Start: Run in container: git fetch --unshallow --tags
Container started
Note that deleting the .git/shallow.lock
file and running the command manually is fast, completing in a second at most on this repository, so I don't think it's an issue of it being a large repo and the command is getting stuck on a scale issue.
I've tried checking out the same repos on a different machine running the same OS (macOS), same version of VSCode (1.58.2), same extensions, same version of Docker for Mac, and the fetch works. 🤔
What's the best way for me to debug why the git process is failing other than the dev container logs? Is there anyway to get verbose logs that will print out things like the exit code, etc?
Non-zero exit codes are shown. You can set the log level for Remote-Containers to trace
in the user settings:
That will add 'stop' logs for when the commands stopped among other things. Maybe that will give us a hint. Could attach the full log of a run reproducing the problem?
I have full logs here I can share. Interestingly after attempting repeatedly on both the computer that fails every time and the computer I saw it working fine, the computer that was working also had some failures. This appears to be intermittent.
The zoomed in logs with trace enabled are:
[9106 ms] Start: Run in container: git fetch --unshallow --tags
remote: Enumerating objects: 53476, done.
remote: Counting objects: 100% (53476/53476), done.
[11518 ms] Stop (2412 ms): Run in container: git fetch --unshallow --tags
The full logs are here:
Reviving this ticket because we are observing this consistently as well. We do not have a .git/shallow.lock
but rather just the .git/shallow
marker file suggesting that git fetch --unshallow
did not even start to fetch.
Manually running git fetch --unshallow
once the container is started fetches the history and branches without issue.
It looks like fetch
is killed or errors without any output.
[27517 ms] Start: Run: docker rm -f cebf24d19f0713c3a91f79fd55de76c0c75379447e14c9fae9d7c529a3d0fa4d
[27521 ms] Start: Run: docker events --format {{json .}} --filter event=start
[27579 ms] Start: Starting container
[27580 ms] Start: Run: docker run --sig-proxy=false -a STDOUT -a STDERR --mount type=volume,src=██████-ba8e99764a9f66b20938dbfeb7e99286,dst=/workspaces --mount source=C:\Users\██████/.aspnet/https,target=/home/vscode/.aspnet/https,type=bind --mount type=volume,src=vscode,dst=/vscode -l vsch.local.repository=██████ -l vsch.local.repository.volume=██████-ba8e99764a9f66b20938dbfeb7e99286 -l vsch.local.repository.folder=██████ -l vsch.quality=stable -l vsch.remote.devPort=0 --entrypoint /bin/sh vsc-██████-ba8e99764a9f66b20938dbfeb7e99286 -c echo Container started
[30028 ms] Stop (15175 ms): Run in container: /bin/sh
[30042 ms] Stop (14937 ms): Run in container: /bin/sh
[30042 ms] Container server terminated (code: 137, signal: null).
[24206 ms] Start: Run in container: git config remote.origin.fetch +refs/heads/*:refs/remotes/origin/*
[26107 ms] Stop (1901 ms): Run in container: git config remote.origin.fetch +refs/heads/*:refs/remotes/origin/*
[26107 ms] Start: Run in container: git fetch --unshallow --tags
[30116 ms] Stop (4009 ms): Run in container: git fetch --unshallow --tags
[30366 ms] Stop (2849 ms): Run: docker rm -f cebf24d19f0713c3a91f79fd55de76c0c75379447e14c9fae9d7c529a3d0fa4d
Container started
@chrmarti Please let me know if I can provide you with any more information.
Also, why go the difficult route of cloning with --depth 1
when the repo is unshallow
ed anyway.
We use a shallow clone so we can start building the Docker image earlier. Maybe we should just wait for the unshallow to finish after that when cloning into a volume.
I would suggest doing a regular clone without limited depth
. The unshallow
needs to happen anyway before one is able to start working. It would also simplify the process, i.e. make the remote.origin.fetch
workaround for #4901 obsolete.
I would suggest doing a regular clone without limited depth.
+1. This optimization is creating problems for folks who don't need it. Maybe the shallow clone should be an opt in feature that's configurable and large repo users can opt in if they value?
I would suggest doing a regular clone without limited
depth
.
another +1 from my side for @mamidenn suggestion!
The problem here was that the temporary container running the unshallowing was removed too early. Fixed with #6492.
For larger repositories doing a shallow clone first helps with performance. The workaround in #4901 shouldn't be needed with the fix in #4619. I guess that just didn't work because of the bug fixed here.
Closing for verification.
@chrmarti what are some verification steps for this issue? Is it similar to #6492?
To verify you can use Remote-Containers: Clone Repository in Container Volume...
with a very large repository and verify that eventually (when the unshallowing has finished in the background) all of the Git history including branches is available.
I've noticed recently that after checking out some repositories the
.git/shallow.lock
file exists for a long time, indefinitely, as if the backgroundgit fetch --unshallow
process is failing or erroring and leaving the file around.I can see the following log lines in the container logs:
I don't see any errors.
When I run
ps aux | grep git
in the container the fetch command is no longer running.When I run
docker ps
locally to see if there are any other containers running linked to the volume running that command, I don't see any.I discovered this because my master branch is shallow, and when trying to fetch it with the unshallow option, it is erroring saying the lock file exists.