spotify / XCRemoteCache

Other
835 stars 52 forks source link

Improved DependenciesReader performance (~10x faster) #139

Closed samuelsainz closed 2 years ago

samuelsainz commented 2 years ago

The problem

The producer mode was increasing the build time a lot for large projects.

Here are some of the numbers I got in our project:

Build time without XCRC Build time in Producer mode Extra time added
On CI ~30 minutes ~80 minutes ~50 minutes
10-core M1 Pro ~10 minutes ~39 minutes ~29 minutes

Since we want to run the producer mode for every commit in our main branch, this would make our CI builds for the main branch occupy the CI agents for too much time.

After adding some logs I found out that the delay was produced by the XCRC PostBuild command for the app main target. Particularly, it was happening when executing this line inside the generatingDependencies function.

The solution

XCRemoteCache PostBuild script was iterating over the String char by char to parse all the dependencies of a target. If the app is big enough, this could end up on parsing hundreds of megabytes in order to get all the dependencies. The String type comes with a lot of features that we don't actually need for this particular problem, so we can use a lower abstraction level to solve this problem and gain in performance.

Using the UTF8View of the string to parse the dependencies, we got a time improvement of 10x times faster than before.

The difference between Strings, Substrings, Unicode Scalars and UTF-8 **Substrings** Instead of using String, we could use Substrings, which allow us to get a view of a portion of the String without creating needless copies of the String, improving performance and memory usage. But there are other forms of string abstractions out there that are even lower level than Substring, and they offer other possible performance benefits. String and Substring are doing a ton of work to hide a bunch of complexity from us. **Unicode Scalars** There are many ways to represent what visually appears to be the same string. For example, the string `é` can be created with the Unicode Scalar `[233]`, or with the combination of a plain “e” character followed by a “combining acute accent”: `[101, 769]`. > Swift defaults to giving you the friendliest API up front, and then gives you the ability to drop to a lower level if you need more raw access to the underlying representation. But what do these representations mean as far as performance is concerned? Well, turns out a lot! **UTF-8** It’s possible to drop down to an even lower abstraction than unicode scalars. Each unicode scalar is made up of multiple UTF8 code units, which you can get access to through the UTF8View on substrings. Unicode scalars can be up to 21 bits in length, whereas UTF8 code units can only be 8 bits, and so it may take a lot more code units to express a single unicode scalar. Sources: * Apple docs ([Unicode Scalars](https://developer.apple.com/documentation/swift/string/unicodescalarview), [UTF-8](https://developer.apple.com/documentation/swift/string/utf8view)) * [pointfreeco](https://www.pointfree.co/) (Totally recommended!)

Benchmarks

I have added a few performance test to compare the previous method (which I have deprecated) with the new method.

Here are the results:

Before

'-[XCRemoteCacheTests.DependenciesReaderPerformanceTest testParseDependenciesFilesList]' measured [Time, seconds] average: 0.089, relative standard deviation: 2.327%, values: [0.093534, 0.086403, 0.086735, 0.087261, 0.086792, 0.086614, 0.087342, 0.088445, 0.086417, 0.087023]

After

'-[XCRemoteCacheTests.DependenciesReaderPerformanceTest testParseDependencyFileListWithUTF8View]' measured [Time, seconds] average: 0.009, relative standard deviation: 4.615%, values: [0.010778, 0.009464, 0.009534, 0.009421, 0.009320, 0.009291, 0.009275, 0.009281, 0.009268, 0.009279]

Screen Shot 2022-05-20 at 21 04 01

Final results

PostBuild Script duration After the improvement Improvement (%)
On CI ~50 minutes ~6 minutes 88 %
10-core M1 Pro ~29 minutes ~3 minutes 90%
samuelsainz commented 2 years ago

@polac24 If you folks don't mind I'd rather leave some of the performance tests. I think there are still some improvements that we can make in the future—for example regarding the different Set copies and iterations over those Sets. I will take a look at it later and performance tests would be useful for that