microsoft / git

A fork of Git containing Microsoft-specific patches.
http://git-scm.com/
Other
789 stars 95 forks source link

pack-objects: allow --path-walk with --shallow #699

Closed derrickstolee closed 4 weeks ago

derrickstolee commented 4 weeks ago

This pull request aims to correct a pretty big issue when dealing with UNINTERESTING objects in the path-walk API. They somehow were only exposed when trying to perform a push from a shallow clone.

This will require rewriting the upstream version so this is avoided from the start, but we can do a forward fix for now.

The key issue is that the path-walk API was not walking UNINTERESTING trees at the right time, and the way it was being done was more complicated than it needed to be. This changes some of the way the path-walk API works in the presence of UNINTERSTING commits, but these are good changes to make.

I had briefly attempted to remove the use of the edge_aggressive option in struct path_walk_info in favor of using the --objects-edge-aggressive option in the revision struct. When I started down that road, though, I somehow got myself into a bind of things not working correctly. I backed out to this version that is working with our test cases.

I tested this using the thin and big pack tests in p5313 which had the same performance as before this change.

The new change is that in a shallow clone we can get the same git push improvements.

I was hung up on testing this for a long time as I wasn't getting the same results in my shallow clone as in my regular clones. It turns out that I had forgotten to use --no-reuse-delta in my test command, so it was picking the deltas that were given by the initial clone instead of picking new ones per the algorithm. 🤦🏻

derrickstolee commented 4 weeks ago

I created derrickstolee/git#31 as a potential extension to include a more complicated edge walk to better mimic the behavior when constructing a packfile with shallow commits in it.

derrickstolee commented 4 weeks ago

I created derrickstolee#31 as a potential extension to include a more complicated edge walk to better mimic the behavior when constructing a packfile with shallow commits in it.

This is really important. I was able to confirm that without the second commit, my pushes from a shallow client to a full clone caused huge packfile sizes.

I'm not sure what caused this to happen, as the revision walk should have been the same. But I guess we need to do things the hard way.

dscho commented 4 weeks ago

I'm not sure what caused this to happen, as the revision walk should have been the same.

Maybe the "shallow commits" are treated as uninteresting, which would be the explanation of bad deltas because they contain the valuable delta targets in --depth=1 clones.

derrickstolee commented 4 weeks ago

I'm not sure what caused this to happen, as the revision walk should have been the same.

Maybe the "shallow commits" are treated as uninteresting, which would be the explanation of bad deltas because they contain the valuable delta targets in --depth=1 clones.

In the case where I'm pushing a single commit on top of my shallow commit, that shallow commit should be revealed as a "boundary" commit. I will try to debug into things to figure out what's going on to cause this to be different.

derrickstolee commented 4 weeks ago

I'm taking a break for the night. I'm running in circles and need to come back with fresh eyes.

derrickstolee commented 4 weeks ago

This version should be good to go. My testing that caused issues last night were 100% because I wasn't using --no-reuse-delta and that was causing my git pack-objects command to trip over existing deltas. That won't happen for our targeted use case which has freshly created the objects and so no deltas will exist on disk to reuse.

derrickstolee commented 4 weeks ago
Here's the range-diff for the latest push: ``` 1: 17ce490c3ea = 1: 17ce490c3ea t5616: mark tests as bogus with --path-walk 2: d13133bf96e ! 2: 8599051ccb4 path-walk: add new 'edge_aggressive' option @@ Commit message In preparation for allowing both the --shallow and --path-walk options in the 'git pack-objects' builtin, create a new 'edge_aggressive' option - in the path-walk API. + in the path-walk API. This option will help walk the boundary more + thoroughly and help avoid sending extra objects during fetches and + pushes. The only use of the 'edge_hint_aggressive' option in the revision API is within mark_edges_uninteresting(), which is usually called before @@ Commit message are walked until a boundary is found. We didn't use this in the past because we would mark objects - UNINTERESTING after doing the initial commit walk to the boundary. But - we actually should be marking these objects as UNINTERESTING, but we - shouldn't _emit_ them all via the path-walk algorithm or else our delta - calculations will get really slow. + UNINTERESTING after doing the initial commit walk to the boundary. While + we should be marking these objects as UNINTERESTING, we shouldn't _emit_ + them all via the path-walk algorithm or else our delta calculations will + get really slow. Based on these observations, the way we were handling the UNINTERESTING flag in walk_objects_by_path() was overly complicated and buggy. A lot 3: 1203ee56eb8 ! 3: 2065f5464fc pack-objects: allow --shallow and --path-walk @@ Commit message However, before the previous change, a trivial removal of the warning would cause a failure in t5500-fetch-pack.sh when GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more - objets than we desired, due to some incorrect behavior of the path-walk - API, especially around walking unintersting objects. + objects than we desired, due to some incorrect behavior of the path-walk + API, especially around walking uninteresting objects. - Now that these things were fixed, we can make this change. Further, we - can add a test to show that the Git client is similarly careful about - selecting the right objects during 'git push' from a shallow clone. + To also cover the symmetrical case of pushing from a shallow clone, add + a new test to t5538-push-shallow.sh that confirms the correct behavior + of pushing only the new object. This works to validate both the + --path-walk and --no-path-walk case when toggling the + GIT_TEST_PACK_PATH_WALK environment variable. This test would have + failed in the --path-walk case if we created it before the previous + change. Signed-off-by: Derrick Stolee @@ t/t5538-push-shallow.sh: EOF + +test_expect_success 'push new commit from shallow clone to origin is efficient' ' + git init origin && -+ echo a >origin/a && -+ git -C origin add a && -+ git -C origin commit -m "base" && -+ echo b >origin/b && -+ git -C origin add b && -+ git -C origin commit -m "tip" && ++ test_commit -C origin a && ++ test_commit -C origin b && + + git clone --depth=1 "file://$(pwd)/origin" client && + git -C client checkout -b topic && ```
derrickstolee commented 4 weeks ago

Huge thanks to @dscho for prompting me to add an extra test case, demonstrating an issue with shallow pushes. We now have a lot more confidence in the correctness.

Range-diff for latest push ``` 1: 8599051ccb4 ! 1: 1af435e33b7 path-walk: add new 'edge_aggressive' option @@ path-walk.c: static void clear_strmap(struct strmap *map) strmap_init(map); } -+static void show_edge(struct commit *commit UNUSED) ++static struct repository *edge_repo; ++static struct type_and_oid_list *edge_tree_list; ++ ++static void show_edge(struct commit *commit) +{ -+ /* empty */ ++ struct tree *t = repo_get_commit_tree(edge_repo, commit); ++ ++ if (!t) ++ return; ++ ++ if (commit->object.flags & UNINTERESTING) ++ t->object.flags |= UNINTERESTING; ++ ++ if (t->object.flags & SEEN) ++ return; ++ t->object.flags |= SEEN; ++ ++ oid_array_append(&edge_tree_list->oids, &t->object.oid); +} + /** @@ path-walk.c: int walk_objects_by_path(struct path_walk_info *info) + * This is particularly important when 'edge_aggressive' is set. + */ + info->revs->edge_hint_aggressive = info->edge_aggressive; ++ ++ edge_repo = info->revs->repo; ++ edge_tree_list = root_tree_list; + mark_edges_uninteresting(info->revs, show_edge, info->prune_all_uninteresting); + info->revs->blob_objects = info->revs->tree_objects = 0; 2: 2065f5464fc = 2: 2361d5830cd pack-objects: allow --shallow and --path-walk -: ----------- > 3: 423e4cdb537 t5538: add test to confirm deltas in shallow pushes ```

Update: I also rechecked the thin pack tests in p5313 across a variety of full and shallow repos. Everything looks good there.

derrickstolee commented 4 weeks ago

I pushed the minor test name change updates in order to trigger new builds that won't time out on mac now that #700 is merged.

dscho commented 4 weeks ago

As to win test (8) failing, that's a flake that I documented here.