shorebirdtech / shorebird

Code Push for Flutter and other tools for Flutter businesses.
https://shorebird.dev
Other
2.21k stars 133 forks source link

feat: Make patch sizes smaller on iOS #811

Open eseidel opened 1 year ago

eseidel commented 1 year ago

Currently patch sizes are often 10s of kb. Which is plenty small for many applications. However we're using a binary diffing algorithm which knows nothing about what it's diffing and could probably be another 10x smaller if it did.

eseidel commented 3 months ago

We should, yes. But we'll do this later as part of optimizing our costs.

eseidel commented 1 month ago

I looked into this briefly (for iOS) today. I had forgotten that we use ELF as a format for patch distribution, even on iOS. I believe that's the main reason the patches end up larger on iOS is we're diffing an ELF file against a MachO file. We could get around this by instead using our own custom format, which just contained the 4 dart blobs and did not bother to package in elf since we don't need any of the features of elf, just the 4 raw blobs which we load read-only anyways.

eseidel commented 1 month ago

I also looked at reducing the LinkTable size and found that moving from 64bit ints to 32bit ints had less savings than I was expecting?

64bit ints: LinkTable size: 196608 bytes

32bit ints: LinkTable size: 131072 bytes

Moving to LEB128: LinkTable size: 131072 bytes

I guess it's padding that's making these so even?

eseidel commented 1 month ago

LEB128 code:

diff --git a/pkg/aot_tools/lib/src/linker/linker.dart b/pkg/aot_tools/lib/src/linker/linker.dart
index 279d461bc2c..2307bc0fa3b 100644
--- a/pkg/aot_tools/lib/src/linker/linker.dart
+++ b/pkg/aot_tools/lib/src/linker/linker.dart
@@ -1,5 +1,3 @@
-import 'dart:typed_data';
-
 import 'package:aot_tools/src/linker/link_reporter.dart';
 import 'package:aot_tools/src/linker/shorebird_linker_overrides.dart';
 import 'package:aot_tools/src/snapshot_analysis.dart';
@@ -12,10 +10,51 @@ class ByteWriter {
   final bytes = <int>[];

   /// Write a 64-bit integer in little endian to the byte stream.
-  void addInt64(int value) {
-    /// Arm64 is little endian.
-    final byteData = ByteData(8)..setInt64(0, value, Endian.little);
-    bytes.addAll(byteData.buffer.asUint8List());
+  // TODO(eseidel): Use 32-bit integers or LEB-128 instead to save space.
+  // void addInt64(int value) {
+  //   assert(value >= 0 && value <= 0xFFFFFFFFFFFFFFFF, 'value: $value');
+
+  //   /// Arm64 is little endian.
+  //   final byteData = ByteData(8)..setInt64(0, value, Endian.little);
+  //   bytes.addAll(byteData.buffer.asUint8List());
+  // }
+
+  void _writeByte(int value) {
+    assert(value >= 0 && value <= 0xFF, value);
+    bytes.add(value);
+  }
+
+  /// Write one ULEB128 number.
+  void addInt64(int unsignedInt) {
+    // First bit in every byte is a signal if there is more data.
+    // So, we split the number into 7-bit parts and write them smaller to larger,
+    // prefixing with the signal.
+    var value = unsignedInt;
+    assert(value >= 0 && value <= 0xFFFFFFFFFFFFFFFF, 'value: $value');
+    var bytes = 0;
+    for (;;) {
+      var part = 0x7F & value;
+      value >>= 7;
+
+      if (value != 0) {
+        // Signal that there is more data.
+        part |= 0x80;
+      }
+      bytes++;
+      _writeByte(part);
+
+      if (value == 0) {
+        break;
+      }
+
+      if (value == -1) {
+        for (var i = 0; i < 9 - bytes; i++) {
+          _writeByte(0xFF);
+        }
+        _writeByte(1);
+        break;
+      }
+    }
   }
 }
eseidel commented 1 month ago

I think the next thing to test here is what the order of symbols between macho and elf are. If they're not in the same order, that's the easy thing to fix.

We could trivially make our own non-elf replacement format for patches, we would just want to make sure that the "symbol order" in the package matched the order in mach-o for maximum similarity?

eseidel commented 1 month ago

macho order:

0000000000004080 T _kDartVmSnapshotInstructions
0000000000011280 T _kDartIsolateSnapshotInstructions
00000000000c4880 S _kDartVmSnapshotData
00000000000cd8c0 S _kDartIsolateSnapshotData
00000000001a4000 b _kDartVmSnapshotBss
00000000001a4018 b _kDartIsolateSnapshotBss

elf order:

00000000000001c8 R _kDartSnapshotBuildId
0000000000000200 R _kDartVmSnapshotData
0000000000009240 R _kDartIsolateSnapshotData
00000000000e0000 T _kDartVmSnapshotInstructions
00000000000ed200 T _kDartIsolateSnapshotInstructions
eseidel commented 1 month ago

Yeah, so it looks like mach-o puts instructions before data and elf the opposite way around. That's presumably why the diffs end up extra large. I'm not sure how well zstd handles reordering sections within binary files.

eseidel commented 1 month ago

Another test we could do is to write a tiny tool that read in a mach-o snapshot and wrote out an elf snapshot and then ran patch against the two to see.