salesforce / p4-fusion

A fast Perforce to Git conversion tool written in C++ using Perforce Helix Core C++ API and Libgit2
BSD 3-Clause "New" or "Revised" License
78 stars 15 forks source link

Branch support for #54 #61

Closed groboclown closed 1 year ago

groboclown commented 1 year ago

Adds branch support for ticket #54. This is a fairly big rewrite, and my C++ skills are pretty rusty, so it should have some eyes put on it.

Notes on the code changes:

A few things that are dangling and not completed with this PR:

salesforce-cla[bot] commented 1 year ago

Thanks for the contribution! Unfortunately we can't verify the commit author(s): groboclown g***@g***.com. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, sign the Salesforce.com Contributor License Agreement and this Pull Request will be revalidated.

salesforce-cla[bot] commented 1 year ago

Thanks for the contribution! Before we can merge this, we need @groboclown to sign the Salesforce.com Contributor License Agreement.

twarit-waikar commented 1 year ago

@groboclown This is fantastic! Please give me some time to read through your changes and understand them. I will likely run some experiments on my side as well.

For now I have turned on CI workflows on your PR and looks like there are a few build errors here. However that's in the test app and not in the main app.

twarit-waikar commented 1 year ago

Version number bump

I am open to calling this p4-fusion 2.0 honestly. p4-fusion is radically changing its behaviour with this change, is what I see on a first glance

groboclown commented 1 year ago

For now I have turned on CI workflows on your PR and looks like there are a few build errors here. However that's in the test app and not in the main app.

I discovered how to run the tests and the format validation. I fixed up the validation errors that locally showed up, and pushed those above. I'm looking through the test issues to solve those next.

groboclown commented 1 year ago

Does this change need something else to run the builds?

peterguy commented 1 year ago

I have identified a situation that causes this error when cloning:

[ ERROR @ main:420 ] Exception occurred: St12length_error: basic_string

I've attached a script that will reproduce that error. It uses a Perforce server running in Docker, so it doesn't interfere with anything that's already running.

To test it, download the script, rename it with ".sh", make it executable, then in Terminal, source ./perforce_stream_testing_functions.sh followed by test true to trigger the error or test to run it without the error.

To cause the error, I create a new stream with main as the parent, integrate the parent files into the new stream, submit that change, clone the depot, add a file to the stream and try to clone again.

EDIT: the testing script was kind of messy: it left the Docker container running, and it used the current working directory to store both Perforce files and the Git repository that p4-fusion creates. I made it behave better - use a temp directory, clean up the Docker container, etc...

perforce_stream_testing_functions.txt

groboclown commented 1 year ago

@peterguy Thanks for the awesome test script! That was really helpful getting down to discovering the bug.

The error happens on git_api.cc.

This bit of code was originally written to expect the Git commit message to end with specific generated text, but my patch can include extra branch information.

There are two bugs here.

Specifically, in the example you brought up, the original p4-fusion commit message was:

4 - integrate main into dev1
[p4-fusion: depot-paths = "//new-stream-depot/": change = 4]

But with my patch, it's now:

4 - integrate main into dev1
[p4-fusion: depot-paths = "//new-stream-depot/": change = 4]; merged from refs/heads/main

I've put together a patch for these two bugs.

peterguy commented 1 year ago

I've put together a patch for these two bugs.

Love the incredibly rapid turnaround! I had enough time to identify and confirm the bug, but had run out of time to track down the actual bug.

I'm evaluating this PR to see if we can use it in our product - we use p4-fusion but want to add streams support. It's looking great so far; I wish my C++ skills were as rusty as yours, @groboclown. :-)

I suppose I could take this opportunity to also ask about the --branch option - did you consider supporting something like --branch ALL to trigger a lookup of the branches and an iteration over them, instead of needing to lookup the branches separately and feed them all to p4-fusion using multiple --branch options?

groboclown commented 1 year ago

@peterguy Support for something like "--stream //my-stream" may make sense, to detect all the child streams and be able to create them as branches in Git. However, there isn't a real way to use "--branch all" in a general sense with Perforce. Perforce allows for integrating from anywhere to anywhere within the server (with some special cases for graph depots and a few others), which makes the concept of auto-detecting branches difficult at best. I initially explored the idea of using branchspecs to define the branches, but that created a wide variety of special case handling that was tricky at best.

A "--stream //name" option may be a good additional argument for another PR. Let's get the branch support working well first, then we can work on the quality-of-life improvements.

peterguy commented 1 year ago

@groboclown (tagging for notification)

Perforce allows for integrating from anywhere to anywhere within the server

I'm considering adding admin config options to specify information about depots because so much of the usage and configuration of a Perforce server is by convention, similar to Subversion.

Let's get the branch support working well first

Definitely! I just have lots of questions about how Perforce operates and am taking any opportunity to ask and learn. :-) For example, you use the terms "branch" and "stream" separately, which caused me to go read more about branches in Perforce. I had understood them to be basically the same concept, but it looks like a "branch" is more akin to a client than to a stream - it's a view of files to support integration. Does adding --branch options to p4-fusion cause it to create branch views?

groboclown commented 1 year ago

@peterguy (tagged for notification)

The original ticket (#54) requested support for branchspecs, so this is a good place to discuss the decisions I made in ignoring them for this PR. I buried my main point about a follow-up feature in the very last paragraph.

The "branch" concept in Perforce was a simple bit of meta-data added to the server to attempt to add a formality to describing how changes are intended to propagate between locations in the depot. Additionally they indicate a direction (and therefore a reverse direction). As you correctly identified, it reuses the concept of the depot view that clients (and labels and many other things) also use. Perforce can use a branchspec depot view mapping when performing an integrate to aid the user in handling the correct path translation.

However, because branchspecs are so flexible, some teams use them to define much more than the standard "branch" idea. I've seen one team use them in place of a library release process (shared library code was copied between projects via one branchspec per copy), which was itself a layer on top of product branching (yes, it quickly became a maintenance nightmare). They can also have a non-trivial file mapping. Additionally, when you deal with things like merging bug fixes between version branches, the meaning behind the direction part of the branchspec becomes muddled. While teams can have legitimate use cases for these complex scenarios, they're very difficult to tease out in a general way.

I made several attempts to handle the branching through branchspecs, and eventually gave up and went with a simple "list branch roots" in a way very similar to how stream depots manage branches. One of the big issues I ran into was how to name a branch, and different attempts on top of that ran into trying to figure out what the branch direction meant for the clone operation.

I've only seen projects handle their formal concept of branches through the convention "//path/to/project/branchname/...", which is how stream depots define their streams. Classic depots may have projects scattered throughout a single depot, like some under "//depot/big-project", then one under "//depot/utils/text-convert", then another under "//depot/fancy-product/servers/authentication-server". So, I think that the simple declarative approach made the most sense.

I can see arguments for discovering branches by using a command like "p4 dirs //base/branch/path/*", which would fall under the streams use case. This would cover your "all" scenario. To me, this would be a good solution to cover several use cases. I still like having the ability to explicitly declare the branches, because it means that you can have the tool ignore branches that you don't care about. Say, if one is stale by several years and you don't care about it anymore.

iainnicol commented 1 year ago

Hi, thanks for this impressive PR.

I'm giving it a shot, because I'm trying to do a full fidelity (i.e. including branches) conversion of Open Watcom v1.9, from Perforce to Git.

So, the main branch is //depot/openwatcom. This converts fine. There's also a branch //depot/ow_release/1.0.

Unfortunately, --branch ow_release/1.0 doesn't work. All the change lists come back with zero files. In fact, it deadlocks in ChangeList::WaitForDownload(). I tried to fix that by adding cl.commitCV->notify_all() here. This stopped the deadlock, but all the change lists were still empty, so the created git history was empty (apart from the initial commit).

So, What's weird is that --branch ow_release sort of works with this pull request. However, that's obviously not quite right, because everything has an unwanted 1.0 top level folder.

My first question is, I am missing something simple, in how to specify the --branch ow_release/1.0 subfolder?

My second question is, if not, do you have any tips for helping debug this?

My third question is, do you think we need that extra notify_all() call? I know it didn't solve the underlying problem, but at the moment I think you can reproduce a a deadlock by doing --branch invalidfoldername.

peterguy commented 1 year ago

@iainnicol to help with debugging, this is how I get the stream names in a depot when I'm using --branch:

p4 -ztag -Mj streams //${DEPOT}/... | jq -r '.Stream' | sed "s|//${DEPOT}/||g"

Where ${DEPOT} is the name of the depot; in your case, depot, looks like.

I think that means that --branch ow_release/1.0 is the correct syntax for you, but check out what p4 streams shows to confirm.

If there's any way to expose a connection to the Open Watcom Perforce depot, or export a clone or something, I'd love to take a look and help you debug this. I'm not as proficient as @groboclown, but I'm preparing to use @groboclown's streams support and am quite interested in more real-world examples.

iainnicol commented 1 year ago

Thanks for the offer @peterguy.

This depot doesn't use streams; it dates back to 1998 originally! But, TBH, I'm reasonably sure I used the correct syntax on the command line.

Sanity check: should this work, given the depot doesn't use streams? As in, are classic branches (p4 branch, p4 integrate) supported by this PR?

Besides, I don't know if you're interested, if the depot doesn't use streams. No pressure. But if you are, I can share a tarball of the depot directory. I should probably delete the password hashes. Do I just delete the db.user.* files?

groboclown commented 1 year ago

@iainnicol The PR supports any path in the depot, and doesn't have explicit requirements for streams.

I just ran a very trivial test using this depot layout:

$ p4 files //...
//depot/openwatcom/b.txt#1 - add change 3 (text)
//depot/openwatcom/t.txt#1 - add change 1 (text)
//depot/ow_release/1.0/t.txt#1 - branch change 2 (text)

I used the command:

p4-fusion --client "${P4CLIENT}" --user "${P4USER}" --src /some/path --port "${P4PORT}" --lookAhead 2000 \
  --path //depot/...  --branch openwatcom --branch ow_release/1.0

This should be expected to work, but it looks like a limitation in the code - right now, it only allows branches to be a single directory name.

I'm looking over the code to see how to improve this to make multiple branch depths work. As a work-around, you can perform some post Git migration path rename operations on the depot. There are some guides online on how to do this.

For those curious, the limitation is in the branch_set.cc file, in the BranchSet::splitBranchPath function. This splits directories blindly, assuming that branches are always in the 1 directory deep format. I have a patch that adds the multiple-directory branch case. Note that it means the branch names will then have a "/" in it.

iainnicol commented 1 year ago

Okay @groboclown! I see you pushed a change, not long after leaving a comment.

With your change, this almost works as I hoped.

The caveat is it's still deadlocking. The Open Watcom depot needs the commitCV to be notify_alled here. (I don't know why your simple demo depot doesn't deadlock. But as I said to Peter, I can share this depot if you'd like. It's all open source.)

The good news is with your change, and the addition to stop the deadlock, the branch successfully converts.

Some of the revisions have a very nested revision graph, but this could very well be correct. I haven't looked into this in detail, but I will do, because it obviously caught my eye.

--

Finally, I don't want to seem entitled, but there is one thing that'd make this even better from my perspective. The Open Watcom Perforce branch layout has grown, ah, very organically over the decades.

Suppose there's a file main.c in //depot/openwatcom. This was first branched into the folder //depot/ow_devel. But then later it was also branched into the folders //depot/ow_devel/plusplus.ext and [edit:] //depot/ow_devel/long_double. So a subset of //depot/ow_devel is a branch, and then there are also branches under ow_devel.

I don't even know a good way to to specify this on the command line. Perhaps --branchExclude which is essentially an argument to the most recent specified branch?

groboclown commented 1 year ago

@iannicol I added in the missing notify_all that you pointed out, but it looks like there was another situation where a notify_all would be missed. I altered the code to ensure that the notify would call out in all scenarios.

As for the "organic" branching in your root project, the current code with the multi-level branch names almost help you out. You can organize the order of branch specifications to have --branch ow_devel/plusplus.ext before --branch ow_devel, but that runs into an issue with Git itself - if you have a branch with a "/" in the name, then that causes the stuff before the "/" to be a directory. That means if you create the branch ow_devel/plusplus.ext, Git will fail with an error when trying to use a branch named ow_devel (and vice versa).

One way around this would be for the code to allow for branch name aliases on the command line (maybe the format --branch "b1=ow_devel/plusplus.ext").

iainnicol commented 1 year ago

Dang, that's so close. That's an unfortunate implementation detail of git.

Knowing how close this is, I think I can just hack it for my purposes. Branch aliases would be nice, don't get me wrong. But clearly your change works well even without it.

groboclown commented 1 year ago

So, git branch aliases wasn't too tricky to add. Change incoming. With this change, you should be able to run with something like:

--branch ow_devel/plusplus.ext:plusplus.ext \
--branch ow/devel/long_double:long_double \
--branch ow_devel \
--branch openwatcom

or perhaps:

--branch ow_devel/plusplus.ext:feature/plusplus.ext \
--branch ow/devel/long_double:feature/long_double \
--branch ow_devel:dev \
--branch openwatcom:main
iainnicol commented 1 year ago

Thank you. All your changes work really well. I think this is a useful feature, which should be merged when the maintainers have the time to review this.

However, I've now realised the Open Watcom depot is way too freeform. I think I should probably park this. I know that up till now I've previously said 'thanks, but please don't feel obliged to change this for me'. It's been great, that you've nonetheless made some small to medium changes. But now I really strongly feel this depot is unfortunately just fundamentally incompatible. It'd be too big a change to support it.

In essence, the problem is branch views were used in the depot. Obviously you earlier discussed and ruled out directly supporting branch views. Unfortunately, the --branch root approach (whilst helpful!) isn't quite expressive enough to manually recreate the branch view.

The fundamental incompatibility is the way some of the subfolders were branched. For example (hypothetical but it demonstrates the idea), suppose in the main branch openwatcom we had the file path //depot/openwatcom/libc/math.c. Then suppose the folder //depot/openwatcom/libc/ was branched into //depot/ow_devel/c11/ (i.e. a development branch with name c11). But that's the wrong folder level! It means //depot/openwatcom/libc/math.c became //depot/ow_devel/c11/math.c. It would have been great if it had been branched to //depot/ow_devel/c11/libc/math.c: we could have expressed that. But unfortunately, with the way it was branched, if we try to convert this, the modified files would end up in the top level, as opposed to in a libc subfolder.

Now, I could probably use git filter-branch, and move math.c etc into a libc subfolder. But I presume then the branch will still be missing the files and folders, which were originally adjacent to the libc folder, such as //depot/openwatcom/build.sh. I'm new to p4, so it's very possible I'm on a foolish endeavour, in that I'm trying to have a complete folder layout, in the branches.

peterguy commented 1 year ago

Hey @iainnicol, I apologize for my sporadic appearance here. I've been pulled in several directions at work, and am circling back around to Perforce now.

I can share a tarball of the depot directory.

I would love to get the depot! The weirder the better, really, so I can test edge cases. :-)

twarit-waikar commented 1 year ago

Reopening to trigger workflow checks

twarit-waikar commented 1 year ago

Thank you for the amazing work!