hpi-swa / Squot

Squeak Object Tracker - Version control for arbitrary objects, currently with Git storage
Other
61 stars 29 forks source link

Speed up commits and comparisons by exploiting the SystemChangeNotifier #362

Open LinqLover opened 2 years ago

LinqLover commented 2 years ago

Creating a commit (preview) may take a very long time for larger packages, we might be able to significantly reduce this overhead by using a SystemChangeNotifier to track changes differentially rather than scanning every method prior to check-in. Proposed by @tom95.

tom95 commented 2 years ago

Specifically, profiling on a linux system revealed that most time is spent looking for overwritten methods. Batching this operation for all tracked artifacts may yield around an order of magnitude faster commits in large images for projects with many packages.

In the below proof of concept, we traverse the entire system only once and collect the overriden methods for all involved packages.

[SBGitPackageGroup new packageNames: #('Sandblocks-Representation' 'Sandblocks-Ohm'
'Sandblocks-Debugger' 'Sandblocks-Plugin-StLivePreview' 'Sandblocks-Squeak6Fixes'
'Sandblocks-CustomJavascript' 'Sandblocks-Utils' 'Sandblocks-Tutorial' 'BaselineOfSandblocks'
'Sandblocks-Simulation' 'Sandblocks-Layout' 'Sandblocks-Morphs' 'Sandblocks-Core'
'Sandblocks-Watch' 'Sandblocks-Explorer' 'Sandblocks-Compiler' 'Sandblocks-Smalltalk'
'Sandblocks-Scheme' 'Sandblocks-Drawing' 'Sandblocks-RatPack' 'Sandblocks-Graph'
'Sandblocks-Babylonian')] timeToRun.

SBGitPackageGroup>>packageNames: aCollection
    | infos overrides |
    infos := aCollection collect: [:name | PackageInfo named: name].
    overrides := Dictionary new.
    infos first allOverriddenMethodsDo: [:method | | category changeRecord |
        category := method category.
        changeRecord := infos first changeRecordForOverriddenMethod: method.
        infos do: [:info |
                    ((info isYourClassExtension: category) not and: [changeRecord notNil]) ifTrue:
                         [(overrides at: info packageName ifAbsentPut: [OrderedCollection new]) add: changeRecord]]].

    ^ infos collect: [:info |
        MCSnapshot fromDefinitions: (Array streamContents: [:stream |
            info systemCategories ifNotEmpty: [:categories |
                              stream nextPut: (MCOrganizationDefinition categories: categories)].
            info methods do: [:ea | stream nextPut: ea asMethodDefinition].
            (overrides at: info packageName ifAbsent: [#()]) do: [:ea | stream nextPut: ea asMethodDefinition].
            info classes do: [:ea | stream nextPut: ea classDefinitions]])]

As Christoph said, the "ultimate" performance boost would be for Squot to be aware of live changes to tracked artifacts, so that the incredibly expensive system traversal could be dropped entirely. For the desired performance gain, it would only have to take note of methods existing outside the artifact's own category (so extension + override methods).

Note that all my findings are limited to Linux. Christoph confirmed some performance improvements on Windows as well.

j4yk commented 2 years ago

I have noticed and thought about the same thing in the past.

What are your ideas for contingency measures if a change goes unnoticed through the notification mechanism? Then what you can commit would be different what is in the image.

LinqLover commented 2 years ago

Just to add to @tom95's findings:

What are your ideas for contingency measures if a change goes unnoticed through the notification mechanism? Then what you can commit would be different what is in the image.

I have made promising experiences with the default change-tracking mechanism in Monticello (the stars displayed next to packages), so I would not deem the risk too high. Apart from that, similar to the Monticello Browser, the Squit Browser could offer an optional "check for changes" item in the repository menu if automatic change tracking does not work.

By the way, is there an easy answer to the question of what makes preparing a commit for Squit so much more expensive than preparing a Monticello version? Or is this just my subjective expression? :)

j4yk commented 2 years ago

By the way, is there an easy answer to the question of what makes preparing a commit for Squit so much more expensive than preparing a Monticello version? Or is this just my subjective expression? :)

It depends on the circumstances. If you have multiple packages in the project, it will do the slow scan for each of them, whereas in Monticello it is always at most once (because you can save only one package). If the latest commit was not read yet, it has to be read and the files parsed first (Monticello does not do such things). Other reasons I don't know from memory or may be due to inefficient implementations in Squot and/or FileSystem-Git, so not easy. ;-)