trevorr / vss2git

Migrate Visual SourceSafe repositories to Git
Apache License 2.0
154 stars 116 forks source link

Extremely slow performance on large binary files with many revisions #11

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a VSS database
2. Add a binary file of 10MB in it
3. Create 300 revisions of the file in VSS using check out, change the file a 
bit, check in.
4. Convert this database using vss2git

What is the expected output? What do you see instead?

The output of vss2git is correct as far as I can see. However the process takes 
extremely long (weeks). Scanning the database is fast (<1 minute), but each 
revision takes in the order of hours.

What version of the product are you using? On what operating system?

1.0.10.0, Windows 8.1

Please provide any additional information below.

This is a special case, because usually a version control database will contain 
many smaller (textual) files with a small amount of changes. In this case the 
VSS database I work with contains MS Access databases (MDB files) which can 
only be saved monolithically. Hence the large set of revisions for a large 
binary file.

I debugged the process in Visual Studio 2013. The long time for processing each 
revision is spent, on a high level, like this:

* Process a revision for MyBinaryFile.bin, for example revision 1
  *  VssFileRevision.GetContents() - for revision 1
    * Get the last revision for MyBinaryFile.bin, for example 300
      * Loop until we're at revision 1
        * Get the delta operations for revision 300
        * Merge them with revision 1
        * Get the previous revision, 299
        * .. etc, until we're at the revision we want (revision 1)
    * Return the contents of the result of all the merges

The Merge operation is very expensive, it takes multiple minutes. And because 
of the large amount of revisions and the way this loop is set up, it is 
executed many times.

I don't understand enough of the structure of VSS databases and of vss2git to 
be able to see whether all of these steps are essential and how they could be 
optimized. I do know that converting the database I'm working with is not 
practical because it takes far too long.

Original issue reported on code.google.com by aron.mul...@gmail.com on 12 Feb 2015 at 12:45

GoogleCodeExporter commented 9 years ago
Note that I never completed a full run as described; I stopped the process 
after several days and checked the intermediate results. The result of the 
complete process taking "weeks" is an estimate based on this.

Original comment by aron.mul...@gmail.com on 12 Feb 2015 at 12:49

GoogleCodeExporter commented 9 years ago
We have about 252k revisions in 56k files and after 2 days of running we are at 
20% progress done (about 7Gb). 1 of 4 CPU's is at 100%. So we also think its 
running slow...

Original comment by pascal.f...@gmail.com on 16 Apr 2015 at 2:22

beppler commented 3 years ago

Hi, I have an issue like this one, but is for an small binary file (a COM type library).

Every time an revision it stucks on xxxx: Edit revision yyy for many seconds, some time even minutes.

Like the example bellow:

Replaying changeset 39 from 07/20/2004 19:36:37
D:\Projects\Local\vss-folha-pub\TJRJ\Fontes\Classes\Folha.dpr: Edit revision 7

For other file types it is very fast.

The conversion is running on a SSD drive with an exclusion set on antivirus software.