sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
9 stars 1 forks source link

Filling gaps with one-token sections makes for odd-looking merges. #43

Closed sageserpent-open closed 1 month ago

sageserpent-open commented 1 month ago

This follows on from #42.

The work done there introduced the notion of filling content gaps between matched sections in MappedContentSources with lots of one-token sections; previously the gaps were filled with single sections, each section spanning the gap in one go.

While that sorted out the problems addressed by #42, it yields some odd looking merges in CodeMotionAnalysisExtensionTest.merging when the minimum match size is low.

Running the manual test discussed here: https://github.com/sageserpent-open/kineticMerge/issues/30#issuecomment-2106144845 also yields mixed results - in some ways the merge is improved, in others it looks suspect.

The job here is to get nicer merge results without breaking the tests introduced in #42.

One approach might be to look for common prefixes and / or suffices in gaps, then to split the gaps into an optional prefix, body and an optional suffix - that should be enough to keep the tests from #42 happy, but might avoid the messy merges.

sageserpent-open commented 1 month ago

Looked at the logs with mimimum match sizes of 5 and 6 in CodeMotionAnalysisExtensionTest.merging as of: aedc282e5d802f3082dd2b605694bfc81378ce6f, as well as of the earlier commit: 98ad215623561b7a39e5db0faeb1f37abb68f1bb when gap were filled with single sections only.

This reveals the same kind of problem seen back in #19 - the merge algorithm works well when content is gathered up into big sections, regardless of whether those are matched sections or gap fillers. Introducing runs of one-token gap filler sections not only clobbers the merge in terms of performance, but forces it to align on single tokens that may or may not be good choices.

sageserpent-open commented 1 month ago

Suppose we identify all gap filler content on the base that is a substring of some gap filler content on the left or right, splitting the left and right gap fills into chunkier sections. This is essentially the prefix / suffix idea from before, only generalises to allow matches in the middle. The point is, having chunky gap fills might yield decent merges in general, as with the older implementation, but still grant us the fix from #42.

sageserpent-open commented 1 month ago

That approach worked and was delivered in 1fc422108b89e39cc6608eb1b16027e0c3bf6e4f.

sageserpent-open commented 1 month ago

This went out in release 1.2.0, Git commit SHA: 4d1935beef2b5923417904895884de8c2e749c6e.