rvaughn / cvs-fast-export

CVSNT-to-Git conversion utility
11 stars 4 forks source link

Files missing in branches when they were never changed on the branch #9

Closed robotroll closed 6 years ago

robotroll commented 6 years ago

Hi,

I found a problem with CVS files that got branched but never got changed in that branch. Those are then missing in the Git export.

Usually this is indicated by a branch name and revision combination in the symbols section of the CVS ,v file where the revision is not existent.

I'm unsure why CVSNT uses those nonexistent revisions for the branches.

This example contains one file which demonstrates this behavior. It demonstrates the behavior with the Branch1 and Branch2 and master. When exported to Git only the master branch exists. test.zip

rvaughn commented 6 years ago

Hi,

Sorry for the delay. I'm not sure what's happening here since it's very common for files to get branched and never changed. I know that's normally handled, so there has to be something weird about the way CVS is tagging some files. I'll look into it this week.

rvaughn commented 6 years ago

I see the issue now. This is by design. The problem isn't that the files weren't changed on the branch, it's that nothing was ever changed on the branch. Because the branch has no changes at all (i.e. no commits), cvs-fast-export just discards it.

I'll see what I can do about keeping that kind of branch.

robotroll commented 6 years ago

Hi,

thanks for your fix.

I detected the problem while diffing older branches. When a file never changed on that branch it was not included in the Git repository. In a small test those files are now also included.

I will now do a full export again and test if the issue is resolved.

robotroll commented 6 years ago

Hi,

unfortunately the issue is still present when I export the whole repository. The most minimal example I found is attached. missingFile.zip

The file casio_getlogfile.bat is missing on the branches R201701 and R4800 when all the files of the zip are converted. When the file gets converted alone it is present on the branches since the changes of commit c6ab2aa.

I hope you can reproduce the issue.

rvaughn commented 6 years ago

OK, I'm on it.

rvaughn commented 6 years ago

Here's what's happening with your conversion. casio_getlogfile.bat appears to have been added recently and monkey-patched into your two older branches. Its first and only commit is dated after the latest commits on both branches. cvs-fast-export relies on those commit dates to reconstruct your commit history, so from its point of view the file could not possibly be part of either branch.

This is the kind of revisionist history editing (pardon the pun) that you can get away with in CVS but not in Git. cvs-fast-export can't possibly handle these cases on its own.

What I suggest is to create a backup of casio_getlogfile.bat,v and edit your repo copy. Any good text editor will do. Change the commit date of the file from 2017 to something earlier - I suggest simply changing the year to 2014, or use the commit date of casio_install.bat - 2012-11-29 14:03:52. Then convert to Git, and the file will appear in both branches like you expect.

You'll notice after this change that cvs-fast-export starts complaining about "no parent found for tag" for a handful of tags. This is because the tags were created during the R4800 and R201701 lifecycles, but casio_getlogfile.bat doesn't have them. You can add the missing tags to the file to clean up these warnings - and your Git output.

robotroll commented 6 years ago

Damn those monkeys!

I'm not sure if I understand the problem correctly. You say that there is no file on the branch that has a commit that is later than the adding of the problematic file? While searching for such files I found files that where changed on the branch after the initial adding of the problematic file. I added the file lec.dll,v to the test archive. This one has a commit on the branch R4800 after the casio_getlogfile.bat got added.

missingFile2.zip

rvaughn commented 6 years ago

In the files you gave me, that's correct. The problem file was committed on July 12, but the last commit to R4800 was July 4. The only commit to R201701 was Feb 22.

The first commit on a branch is more important though. As far as cvs-fast-export can tell, that's the date a branch was created, so all files should already be present in the mainline by then. There should be a commit in the branch to add any new files after that. This is where casio_getlogfile.bat really fails - the first commit to R4800 was Sept 30, 2014, much earlier than the file's commit date. I'll look at your new zip, but this issue still remains - casio_getlogfile.bat is newer than the start date of both branches.

robotroll commented 6 years ago

Have you found time to look into the new zip? Is it maybe possible to create a fixup-commit for such files? Similar to the tag fixup-commits? I tried to implement it myself with no success.

rvaughn commented 6 years ago

I may be able to do something about it. It's going to be a "warning" situation, since there is nothing in the CVS repo to indicate when those files joined the branch. I'll get to it as soon as I can, but this week is pretty busy.

robotroll commented 6 years ago

Hey, any news on the problem?

rvaughn commented 6 years ago

I apologize and I know this has been open a long time now. The past couple of months have been extremely busy, but I'll try to knock this out this weekend.

rvaughn commented 6 years ago

Done. This fixes your three missing file cases. (casio_install.bat was also missing from R4500.) I still say these files are in an improper state in CVS, but I've handled them as best I can. You should get similar results from CVS and git now.

robotroll commented 6 years ago

Finally had time to test the changes. Working as expected. Thank you so much for taking the time to fix this issue!