Closed groboclown closed 1 year ago
Thanks for the contribution! Unfortunately we can't verify the commit author(s): groboclown g***@g***.com. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, sign the Salesforce.com Contributor License Agreement and this Pull Request will be revalidated.
Thanks for the contribution! Before we can merge this, we need @groboclown to sign the Salesforce.com Contributor License Agreement.
@groboclown This is fantastic! Please give me some time to read through your changes and understand them. I will likely run some experiments on my side as well.
For now I have turned on CI workflows on your PR and looks like there are a few build errors here. However that's in the test app and not in the main app.
Version number bump
I am open to calling this p4-fusion 2.0 honestly. p4-fusion is radically changing its behaviour with this change, is what I see on a first glance
For now I have turned on CI workflows on your PR and looks like there are a few build errors here. However that's in the test app and not in the main app.
I discovered how to run the tests and the format validation. I fixed up the validation errors that locally showed up, and pushed those above. I'm looking through the test issues to solve those next.
Does this change need something else to run the builds?
I have identified a situation that causes this error when cloning:
[ ERROR @ main:420 ] Exception occurred: St12length_error: basic_string
I've attached a script that will reproduce that error. It uses a Perforce server running in Docker, so it doesn't interfere with anything that's already running.
To test it, download the script, rename it with ".sh", make it executable, then in Terminal, source ./perforce_stream_testing_functions.sh
followed by test true
to trigger the error or test
to run it without the error.
To cause the error, I create a new stream with main
as the parent, integrate
the parent files into the new stream, submit
that change, clone the depot, add a file to the stream and try to clone again.
EDIT: the testing script was kind of messy: it left the Docker container running, and it used the current working directory to store both Perforce files and the Git repository that p4-fusion
creates. I made it behave better - use a temp directory, clean up the Docker container, etc...
@peterguy Thanks for the awesome test script! That was really helpful getting down to discovering the bug.
The error happens on git_api.cc.
This bit of code was originally written to expect the Git commit message to end with specific generated text, but my patch can include extra branch information.
There are two bugs here.
find_last_of
, which finds the last occurrence of any of the characters. Originally, the string would always terminate with " = (number)]", so the space character (" ") would always be the last thing found. Now, with the extra message after the "]", that's no longer the case. (I thought the "+ 1" to the discovered offset was weird looking, and now after some research into the behavior of find_last_of
I see why).Specifically, in the example you brought up, the original p4-fusion commit message was:
4 - integrate main into dev1
[p4-fusion: depot-paths = "//new-stream-depot/": change = 4]
But with my patch, it's now:
4 - integrate main into dev1
[p4-fusion: depot-paths = "//new-stream-depot/": change = 4]; merged from refs/heads/main
I've put together a patch for these two bugs.
I've put together a patch for these two bugs.
Love the incredibly rapid turnaround! I had enough time to identify and confirm the bug, but had run out of time to track down the actual bug.
I'm evaluating this PR to see if we can use it in our product - we use p4-fusion
but want to add streams support. It's looking great so far; I wish my C++ skills were as rusty as yours, @groboclown. :-)
I suppose I could take this opportunity to also ask about the --branch
option - did you consider supporting something like --branch ALL
to trigger a lookup of the branches and an iteration over them, instead of needing to lookup the branches separately and feed them all to p4-fusion
using multiple --branch
options?
@peterguy Support for something like "--stream //my-stream" may make sense, to detect all the child streams and be able to create them as branches in Git. However, there isn't a real way to use "--branch all" in a general sense with Perforce. Perforce allows for integrating from anywhere to anywhere within the server (with some special cases for graph depots and a few others), which makes the concept of auto-detecting branches difficult at best. I initially explored the idea of using branchspecs to define the branches, but that created a wide variety of special case handling that was tricky at best.
A "--stream //name" option may be a good additional argument for another PR. Let's get the branch support working well first, then we can work on the quality-of-life improvements.
@groboclown (tagging for notification)
Perforce allows for integrating from anywhere to anywhere within the server
I'm considering adding admin config options to specify information about depots because so much of the usage and configuration of a Perforce server is by convention, similar to Subversion.
Let's get the branch support working well first
Definitely! I just have lots of questions about how Perforce operates and am taking any opportunity to ask and learn. :-) For example, you use the terms "branch" and "stream" separately, which caused me to go read more about branches in Perforce. I had understood them to be basically the same concept, but it looks like a "branch" is more akin to a client than to a stream - it's a view of files to support integration. Does adding --branch
options to p4-fusion
cause it to create branch views?
@peterguy (tagged for notification)
The original ticket (#54) requested support for branchspecs, so this is a good place to discuss the decisions I made in ignoring them for this PR. I buried my main point about a follow-up feature in the very last paragraph.
The "branch" concept in Perforce was a simple bit of meta-data added to the server to attempt to add a formality to describing how changes are intended to propagate between locations in the depot. Additionally they indicate a direction (and therefore a reverse direction). As you correctly identified, it reuses the concept of the depot view that clients (and labels and many other things) also use. Perforce can use a branchspec depot view mapping when performing an integrate to aid the user in handling the correct path translation.
However, because branchspecs are so flexible, some teams use them to define much more than the standard "branch" idea. I've seen one team use them in place of a library release process (shared library code was copied between projects via one branchspec per copy), which was itself a layer on top of product branching (yes, it quickly became a maintenance nightmare). They can also have a non-trivial file mapping. Additionally, when you deal with things like merging bug fixes between version branches, the meaning behind the direction part of the branchspec becomes muddled. While teams can have legitimate use cases for these complex scenarios, they're very difficult to tease out in a general way.
I made several attempts to handle the branching through branchspecs, and eventually gave up and went with a simple "list branch roots" in a way very similar to how stream depots manage branches. One of the big issues I ran into was how to name a branch, and different attempts on top of that ran into trying to figure out what the branch direction meant for the clone operation.
I've only seen projects handle their formal concept of branches through the convention "//path/to/project/branchname/...", which is how stream depots define their streams. Classic depots may have projects scattered throughout a single depot, like some under "//depot/big-project", then one under "//depot/utils/text-convert", then another under "//depot/fancy-product/servers/authentication-server". So, I think that the simple declarative approach made the most sense.
I can see arguments for discovering branches by using a command like "p4 dirs //base/branch/path/*", which would fall under the streams use case. This would cover your "all" scenario. To me, this would be a good solution to cover several use cases. I still like having the ability to explicitly declare the branches, because it means that you can have the tool ignore branches that you don't care about. Say, if one is stale by several years and you don't care about it anymore.
Hi, thanks for this impressive PR.
I'm giving it a shot, because I'm trying to do a full fidelity (i.e. including branches) conversion of Open Watcom v1.9, from Perforce to Git.
So, the main branch is //depot/openwatcom
. This converts fine. There's also a branch //depot/ow_release/1.0
.
Unfortunately, --branch ow_release/1.0
doesn't work. All the change lists come back with zero files. In fact, it deadlocks in ChangeList::WaitForDownload()
. I tried to fix that by adding cl.commitCV->notify_all()
here. This stopped the deadlock, but all the change lists were still empty, so the created git history was empty (apart from the initial commit).
So, What's weird is that --branch ow_release
sort of works with this pull request. However, that's obviously not quite right, because everything has an unwanted 1.0
top level folder.
My first question is, I am missing something simple, in how to specify the --branch ow_release/1.0
subfolder?
My second question is, if not, do you have any tips for helping debug this?
My third question is, do you think we need that extra notify_all()
call? I know it didn't solve the underlying problem, but at the moment I think you can reproduce a a deadlock by doing --branch invalidfoldername
.
@iainnicol to help with debugging, this is how I get the stream names in a depot when I'm using --branch
:
p4 -ztag -Mj streams //${DEPOT}/... | jq -r '.Stream' | sed "s|//${DEPOT}/||g"
Where ${DEPOT}
is the name of the depot; in your case, depot
, looks like.
I think that means that --branch ow_release/1.0
is the correct syntax for you, but check out what p4 streams
shows to confirm.
If there's any way to expose a connection to the Open Watcom Perforce depot, or export a clone or something, I'd love to take a look and help you debug this. I'm not as proficient as @groboclown, but I'm preparing to use @groboclown's streams support and am quite interested in more real-world examples.
Thanks for the offer @peterguy.
This depot doesn't use streams; it dates back to 1998 originally! But, TBH, I'm reasonably sure I used the correct syntax on the command line.
Sanity check: should this work, given the depot doesn't use streams? As in, are classic branches (p4 branch, p4 integrate) supported by this PR?
Besides, I don't know if you're interested, if the depot doesn't use streams. No pressure. But if you are, I can share a tarball of the depot directory. I should probably delete the password hashes. Do I just delete the db.user.* files?
@iainnicol The PR supports any path in the depot, and doesn't have explicit requirements for streams.
I just ran a very trivial test using this depot layout:
$ p4 files //...
//depot/openwatcom/b.txt#1 - add change 3 (text)
//depot/openwatcom/t.txt#1 - add change 1 (text)
//depot/ow_release/1.0/t.txt#1 - branch change 2 (text)
I used the command:
p4-fusion --client "${P4CLIENT}" --user "${P4USER}" --src /some/path --port "${P4PORT}" --lookAhead 2000 \
--path //depot/... --branch openwatcom --branch ow_release/1.0
This should be expected to work, but it looks like a limitation in the code - right now, it only allows branches to be a single directory name.
I'm looking over the code to see how to improve this to make multiple branch depths work. As a work-around, you can perform some post Git migration path rename operations on the depot. There are some guides online on how to do this.
For those curious, the limitation is in the branch_set.cc
file, in the BranchSet::splitBranchPath
function. This splits directories blindly, assuming that branches are always in the 1 directory deep format. I have a patch that adds the multiple-directory branch case. Note that it means the branch names will then have a "/" in it.
Okay @groboclown! I see you pushed a change, not long after leaving a comment.
With your change, this almost works as I hoped.
The caveat is it's still deadlocking. The Open Watcom depot needs the commitCV to be notify_all
ed here. (I don't know why your simple demo depot doesn't deadlock. But as I said to Peter, I can share this depot if you'd like. It's all open source.)
The good news is with your change, and the addition to stop the deadlock, the branch successfully converts.
Some of the revisions have a very nested revision graph, but this could very well be correct. I haven't looked into this in detail, but I will do, because it obviously caught my eye.
--
Finally, I don't want to seem entitled, but there is one thing that'd make this even better from my perspective. The Open Watcom Perforce branch layout has grown, ah, very organically over the decades.
Suppose there's a file main.c
in //depot/openwatcom
. This was first branched into the folder //depot/ow_devel
. But then later it was also branched into the folders //depot/ow_devel/plusplus.ext
and [edit:] //depot/ow_devel/long_double
. So a subset of //depot/ow_devel
is a branch, and then there are also branches under ow_devel
.
I don't even know a good way to to specify this on the command line. Perhaps --branchExclude
which is essentially an argument to the most recent specified branch
?
@iannicol I added in the missing notify_all
that you pointed out, but it looks like there was another situation where a notify_all
would be missed. I altered the code to ensure that the notify would call out in all scenarios.
As for the "organic" branching in your root project, the current code with the multi-level branch names almost help you out. You can organize the order of branch specifications to have --branch ow_devel/plusplus.ext
before --branch ow_devel
, but that runs into an issue with Git itself - if you have a branch with a "/" in the name, then that causes the stuff before the "/" to be a directory. That means if you create the branch ow_devel/plusplus.ext
, Git will fail with an error when trying to use a branch named ow_devel
(and vice versa).
One way around this would be for the code to allow for branch name aliases on the command line (maybe the format --branch "b1=ow_devel/plusplus.ext"
).
Dang, that's so close. That's an unfortunate implementation detail of git.
Knowing how close this is, I think I can just hack it for my purposes. Branch aliases would be nice, don't get me wrong. But clearly your change works well even without it.
So, git branch aliases wasn't too tricky to add. Change incoming. With this change, you should be able to run with something like:
--branch ow_devel/plusplus.ext:plusplus.ext \
--branch ow/devel/long_double:long_double \
--branch ow_devel \
--branch openwatcom
or perhaps:
--branch ow_devel/plusplus.ext:feature/plusplus.ext \
--branch ow/devel/long_double:feature/long_double \
--branch ow_devel:dev \
--branch openwatcom:main
Thank you. All your changes work really well. I think this is a useful feature, which should be merged when the maintainers have the time to review this.
However, I've now realised the Open Watcom depot is way too freeform. I think I should probably park this. I know that up till now I've previously said 'thanks, but please don't feel obliged to change this for me'. It's been great, that you've nonetheless made some small to medium changes. But now I really strongly feel this depot is unfortunately just fundamentally incompatible. It'd be too big a change to support it.
In essence, the problem is branch views were used in the depot. Obviously you earlier discussed and ruled out directly supporting branch views. Unfortunately, the --branch
root approach (whilst helpful!) isn't quite expressive enough to manually recreate the branch view.
The fundamental incompatibility is the way some of the subfolders were branched. For example (hypothetical but it demonstrates the idea), suppose in the main branch openwatcom
we had the file path //depot/openwatcom/libc/math.c
. Then suppose the folder //depot/openwatcom/libc/
was branched into //depot/ow_devel/c11/
(i.e. a development branch with name c11
). But that's the wrong folder level! It means //depot/openwatcom/libc/math.c
became //depot/ow_devel/c11/math.c
. It would have been great if it had been branched to //depot/ow_devel/c11/libc/math.c
: we could have expressed that. But unfortunately, with the way it was branched, if we try to convert this, the modified files would end up in the top level, as opposed to in a libc
subfolder.
Now, I could probably use git filter-branch
, and move math.c
etc into a libc
subfolder. But I presume then the branch will still be missing the files and folders, which were originally adjacent to the libc
folder, such as //depot/openwatcom/build.sh
. I'm new to p4, so it's very possible I'm on a foolish endeavour, in that I'm trying to have a complete folder layout, in the branches.
Hey @iainnicol, I apologize for my sporadic appearance here. I've been pulled in several directions at work, and am circling back around to Perforce now.
I can share a tarball of the depot directory.
I would love to get the depot! The weirder the better, really, so I can test edge cases. :-)
Reopening to trigger workflow checks
Thank you for the amazing work!
Adds branch support for ticket #54. This is a fairly big rewrite, and my C++ skills are pretty rusty, so it should have some eyes put on it.
Notes on the code changes:
change_list.cc
level. Code now is filtered through thebranch_set.cc
code, and that's also where the files are split into branch groups (a collection of files that share a source + target branch for a single changelist).FileData
structure how has more responsibility, and some of the file inspection code has been moved into it (which pushes Perforce specific knowledge out of P4API, but it seems contained better this way). In order to be safe with memory, since it can potentially be very heavy weight, it's a wrapper around another structure through a shared pointer. This makes copy constructor semantics more forgiving. It has some weirdness, though, due to it not being aconst
only structure, which means a lot of invocations that read it might also have side effects on the object. Not very pretty.filelog_result.cc
. It's a bit slower, so non-branch runs still usedescribe
. This implementation works with the arguments in P4API to have the change history for only the first revision of each file in the changelist.A few things that are dangling and not completed with this PR:
There's new options and behaviors that need docs.An initial pass at some documentation in the readme is present. Once this PR gets ironed out, then I can revisit them.git_api.cc
could be enhanced, especially around the depot path.TheDonefstat
code was added to have a merge be against an older revision of a branch, but that logic is extremely tricky even if done right, and there's so many ways it can go wrong (see my comments in #54). This should be removed. I'll probably do that shortly.