microsoft / git

A fork of Git containing Microsoft-specific patches.
http://git-scm.com/
Other
761 stars 92 forks source link

Some scalar clones are putting GVFS packs in local object dir instead of shared object dir #645

Closed derrickstolee closed 2 months ago

derrickstolee commented 4 months ago

Setup

$ scalar diagnose
Collecting diagnostic info

git version 2.44.0.vfs.0.2
cpu: x86_64
built from commit: 105fa1d4cd50575fc1e499d0e4159a5e5b62bf7d
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
Repository root: C:/mono_scalar/src
Cache Server: https://wbp-adoprx.microsoft.engineering/ba47400f-eebe-4850-a8bf-ef694ef81414
Local Cache: c:\.scalarCache/id_ba47400f-eebe-4850-a8bf-ef694ef81414

Available space on 'C:/': 33.43 GiB

Are you using Scalar or VFS for Git?

Scalar.

Windows Server 2022. 64-bit.

The machines with this issue are self-hosted ADO build agents intended to run pipeline tasks that require a 1JS Git enlistment.

Details

On these machines, running scalar clone during image creation will cause the prefetch packfiles and GVFS protocol blob packs to be installed in the local .git/objects/pack directory instead of the shared object cache. The gvfs.sharedCache config option is set in .git/config and the folder it points to exists.

(There was some time where I considered that alt_odb_usable() was not registering the cache directory as being valid, but all evidence points to implying that Git can create that directory, so it shouldn't be blocked by something like that.)

AtOMiCNebula commented 4 months ago

The agents in question are actually OMR-based (and technically Server 2022 instead of Win11). We have 1JS agents too, but haven't dug in there because they aren't impacted like our OMR agents are.

AtOMiCNebula commented 4 months ago

I figured this one out. When running scalar clone initially, we clone by specifying a path like c:\foo, with the problem itself being the lowercase c. The lowercase c persists itself all the way through into the repo's config:

[gvfs]
  sharedCache = c:\\.scalarCache/id_ba47400f-eebe-4850-a8bf-ef694ef81414

This becomes problematic in this code, which uses strcmp instead of something like stricmp: https://github.com/microsoft/git/blob/b68812e6facef46153bdeb3d79c1b8f353b8e94d/gvfs-helper-client.c#L326-L331

odb->path above uses C: instead of c:, the comparison doesn't match, and it falls back to .git/objects.

dscho commented 4 months ago

@AtOMiCNebula great find! This strcmp() should be replaced with fspathcmp(). Would you mind opening a PR that makes it so?

derrickstolee commented 4 months ago

One thing @AtOMiCNebula pointed out is that we want to make sure that users in this state are getting into a better state when the config is parsed correctly.

A possible direction to be as safe as possible is to update the incremental-repack maintenance step to move objects from .git/objects into the shared object cache. Move packs wholesale to keep their prefetch-<timestamp>... names. This could be a helpful feature, anyway, since the local object store isn't being repacked in Scalar clones with a shared object cache.

derrickstolee commented 2 months ago

I created #660 to fix this.

dscho commented 2 months ago

A possible direction to be as safe as possible is to update the incremental-repack maintenance step to move objects from .git/objects into the shared object cache.

My major concern about this strategy is that .git/objects can very well contain objects that have not originated from the remote repository, and one single ill-considered re-creation of the shared object cache might very well wipe out objects that will now be irretrievably lost.