virtualization: add new builtin command to print hydration level

jeffhostetler commented 2 months ago

GVFS users can easily (and accidentally) over-hydrate their enlistments. This causes some commands to be very slow.

Create a command to print the current hydration level. This should help our support team investigate the state of their enlistment.

This command will print something like:

% git virtualization
Skipped: 2
Hydrated: 3
Total: 5
Hydration: 60.00%

and log those values to Trace2 in a data_json record of the form:

{"skipped":2,"hydrated":3,"total":5,"hydration":60.00}

derrickstolee commented 2 months ago

This is great @jeffhostetler!

Do we want to fold the functionality into git diagnose, though? Or into git status (potentially with a config option to turn off the behavior if it should turn out to be too costly)?

I think git diagnose would be a good option, but we should contribute new "diagnose tasks" upstream first so this could be plugged into that framework instead of contributing a colliding technique.

Putting it into git status has potential to slow down that key command, so it would need to be behind an option.

jeffhostetler commented 2 months ago

Let me revisit where to put the code. Overnight I realized that a new command, while nice, would require me to allow-list it in the WDG telemetry. Adding it to status or sparse-checkout would not, since they are already being logged.

Adding it to diagnose might be useful, but it isn't being logged right now. Perhaps we add it in a couple of places. It would be nice to have it in the zip file, but that might not be the most convenient for GVFS monitoring.

I worry about the extra overhead in status -- especially with the GVFS status cache always running git status in the background.

Let me look at sparse-checkout (and possibly diagnose).

Do we want to float it in msft/git as an experiment and let WDG evaluate it ? This was their biggest ask in a recent meeting. Or rather, over-hydration was their largest support pain point.

dscho commented 2 months ago

I worry about the extra overhead in status -- especially with the GVFS status cache always running git status in the background.

Maybe we could tack that onto the code that identifies git status as taking a long time? Then the overhead would only add comparatively little to the runtime, yet add value.

jeffhostetler commented 2 months ago

D'oh. It turns out that wt-status.c already has code to compute and print this. It just that the value isn't always printed.

https://github.com/microsoft/git/blob/vfs-2.45.2/wt-status.c#L1761

And the following code hides it from GVFS users.

https://github.com/microsoft/git/blob/vfs-2.45.2/wt-status.c#L1607

This might get a lot simpler....

jeffhostetler commented 2 months ago

I redid this to just print the already-computed value in wt-status. This could go upstream by itself or we could keep it in MSFT/git.

I'm wondering about adding a MSFT-specific commit on top to either remove the if (core_virtualfilesystem) return guard that Ben added OR having a new MSFT-specific config setting to override it. Just removing the test will cause GVFS users to see it on every command and that may be annoying (and it may break scripts). Alternatively, having a "verbose" config setting to print it is something they can opt into if they want. Thoughts??

dscho commented 2 months ago

I'm wondering about adding a MSFT-specific commit on top to either remove the if (core_virtualfilesystem) return guard that Ben added OR having a new MSFT-specific config setting to override it.

You are referring to this, right? https://github.com/microsoft/git/blob/bbddf3508e08929872c777f9824f64ec2811fea8/wt-status.c#L1607-L1608

The text (that is output if we remove that early return) reads: "You are in a sparse checkout." I wonder whether we want to do something like this instead?

diff --git a/wt-status.c b/wt-status.c
index b98e4f699427..afef512a6fea 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1604,10 +1604,14 @@ static void show_sparse_checkout_in_use(struct wt_status *s,
 {
    if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_DISABLED)
        return;
-   if (core_virtualfilesystem)
-       return;
-
-   if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+   if (core_virtualfilesystem) {
+       if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+           status_printf_ln(s, color, _("You are in a fully-hydrated checkout."));
+       else
+           status_printf_ln(s, color,
+                   _("You are in a partially-hydrated checkout with %d%% of tracked files present."),
+                   s->state.sparse_checkout_percentage);
+   } else if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
        status_printf_ln(s, color, _("You are in a sparse checkout."));
    else
        status_printf_ln(s, color,

Just removing the test will cause GVFS users to see it on every command and that may be annoying (and it may break scripts).

As we're talking about code executed as part of wt_longstatus_print() (and notably not wt_porcelain_v2_print() or wt_porcelain_print()), I am a lot less interested in keeping scripted parsing safe: scripts should use the porcelain output that is promised to stay stable. This information would be useful for users to see, I'd think. In the event that it should turn out that it is too annoying, it will be easy enough to introduce an opt-out (or even an opt-in) config setting for this. Personally, I expect no complaints about seeing this additional piece of information, though, in particular because it comes "for free" and does not require extra computation (or worse: I/O).

jeffhostetler commented 2 months ago

New traces are nice. Want to update the PR title before merging?

yeah, let me get caught up and i'll send a new version shortly.

jeffhostetler commented 2 months ago

@dscho Is this message correct? Or is this one of the cases that won't happen?

If we have a sparse-index, we are building upon a sparse-checkout, right? All we know is that we may have sparse-directories (in addition to individual sparse-files), so the net-net is that we can't tell how sparse it really is. We know the number of present cache-entries, but not the number that there would be in a full checkout.

I'm wondering if this case should just say "You are in a sparse checkout with a sparse index with %d entries."

+   if (core_virtualfilesystem) {
+       if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+           status_printf_ln(s, color, _("You are in a fully-hydrated checkout."));

derrickstolee commented 2 months ago

@dscho Is this message correct? Or is this one of the cases that won't happen?

If we have a sparse-index, we are building upon a sparse-checkout, right? All we know is that we may have sparse-directories (in addition to individual sparse-files), so the net-net is that we can't tell how sparse it really is. We know the number of present cache-entries, but not the number that there would be in a full checkout.

I'm wondering if this case should just say "You are in a sparse checkout with a sparse index with %d entries."
+ if (core_virtualfilesystem) {
+     if (s->state.sparse_checkout_percentage == SPARSE_CHECKOUT_SPARSE_INDEX)
+         status_printf_ln(s, color, _("You are in a fully-hydrated checkout."));

The sparse index isn't compatible with a virtual filesystem (you need all the blobs available in the index for immediate hydration) so this case shouldn't happen. It could be a die() scenario or you could ignore it as "your numbers might be bogus" when in such an unsupported state.

jeffhostetler commented 2 months ago

@derrickstolee The scalar functional tests seem to be stuck. It's been running for hours. Is this normal or can we just ignore it and go on?

dscho commented 2 months ago

The scalar functional tests seem to be stuck.

It's actually not stuck, but a side effect of GitHub's branch protection model being a bit... simplistic. We want to require the Scalar Functional Tests to pass for PRs and we want to see a clear visual signal when they fail, but we cannot do that directly, instead, we can only require individual GitHub workflow jobs to pass. See here:

Notice how it says specifically macos-11? I've changed it to macos-13 and now this PR's checks are all green, and I'll merge the PR.

jeffhostetler commented 2 months ago

Thanks!!!

microsoft / git

virtualization: add new builtin command to print hydration level #659