rlktradewright / vss2git

Automatically exported from code.google.com/p/vss2git
Apache License 2.0
0 stars 0 forks source link

Extremely slow performance on large binary files with many revisions #11

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a VSS database
2. Add a binary file of 10MB in it
3. Create 300 revisions of the file in VSS using check out, change the file a 
bit, check in.
4. Convert this database using vss2git

What is the expected output? What do you see instead?

The output of vss2git is correct as far as I can see. However the process takes 
extremely long (weeks). Scanning the database is fast (<1 minute), but each 
revision takes in the order of hours.

What version of the product are you using? On what operating system?

1.0.10.0, Windows 8.1

Please provide any additional information below.

This is a special case, because usually a version control database will contain 
many smaller (textual) files with a small amount of changes. In this case the 
VSS database I work with contains MS Access databases (MDB files) which can 
only be saved monolithically. Hence the large set of revisions for a large 
binary file.

I debugged the process in Visual Studio 2013. The long time for processing each 
revision is spent, on a high level, like this:

* Process a revision for MyBinaryFile.bin, for example revision 1
  *  VssFileRevision.GetContents() - for revision 1
    * Get the last revision for MyBinaryFile.bin, for example 300
      * Loop until we're at revision 1
        * Get the delta operations for revision 300
        * Merge them with revision 1
        * Get the previous revision, 299
        * .. etc, until we're at the revision we want (revision 1)
    * Return the contents of the result of all the merges

The Merge operation is very expensive, it takes multiple minutes. And because 
of the large amount of revisions and the way this loop is set up, it is 
executed many times.

I don't understand enough of the structure of VSS databases and of vss2git to 
be able to see whether all of these steps are essential and how they could be 
optimized. I do know that converting the database I'm working with is not 
practical because it takes far too long.

Original issue reported on code.google.com by aron.mul...@gmail.com on 12 Feb 2015 at 12:45

GoogleCodeExporter commented 9 years ago
Note that I never completed a full run as described; I stopped the process 
after several days and checked the intermediate results. The result of the 
complete process taking "weeks" is an estimate based on this.

Original comment by aron.mul...@gmail.com on 12 Feb 2015 at 12:49