Animating more than 30 elements per frame causes frame drops

sergeymorkovkin commented 1 year ago

Description

On the lower-end Android devices (worth $100-150) Reanimated 2/3 can only handle maximum 30 simultaneous elements per frame. Animating more than that results in serious frame drops. In my tests I was using OPPO A16.

It's been tested with or without new architecture and Fabric. It's been tested with or without @3.0.0-rc8 shared variables performance update. It's been tested with or without requestAnimationFrame optimization.

I'm still researching this issue and it boils down to a few problems:

Inside JSI runtime the requestAnimationFrame function calls are done synchronously via JSI, while every useAnimatedStyle instance does it's own call to rAF. It's been addressed by @kmagiera here.
Every updateProps call is too heavy and takes ~2ms in debug build and ~0.3-0.5ms in release build (fabric/paper respectively). The larger generated style object - the longer it takes, since JSI does argument conversion on a CPU.
THIS IS AN ASSUMPTION. I've noticed react-navigation mappers somehow get involved into a user-defined animation. Need more time to dig into this.

VIDEO CRASHES.md

Steps to reproduce

Take OPPO A16 (or any other phone with Mediatek Helio G35)
Take bokeh demo sample app
Make sure you're rendering exactly 100 elements
Make sure you're using react-native-reanimated@3.0.0-rc8 (this includes better SharedVariables)
Build for release and run demo app (try with/without newArchEnabled - doesn't matter)
Wait for at least 5 minutes

Actual result: FPS is somewhere between 30-40 FPS for 100 elements and app crashes quickly.

Expected result: should be able to animate 100-150 elements at 60 FPS with no crashes.

Snack or a link to a repository

https://github.com/sergeymorkovkin/test-reanimated

Reanimated version

3.0.0-rc8

React Native version

0.70.6

Platforms

Android

JavaScript runtime

Hermes

Workflow

React Native (without Expo)

Architecture

None

Build type

Release mode

Device

Real device

Device model

OPPO A16

Acknowledgements

Yes

kmagiera commented 1 year ago

Can you confirm the crashes still happen after #3852 – this crash was something we also notices during testing and #3852 is resolving that problem for us. We've actually been getting that crash much sooner than after 5 mins and with the increase number of circles it'd happen within few seconds, and after that patch it'd no longer occur.

Regarding BokehExample what needs to be mentioned is that it isn't really implemented optimally, it's never meant to be and in its current form it serves well as a benchmark. It is important for us to know what type of devices can run at what framerate. The most time consuming part in that example is due to the fact that circles manipulate layout properties. Because of that we need to recalculate layout on every single frame and in addition to that, this actually happen as many times as there are circles – reanimated isn't optimized for this type of scenario (yet?) and we don't do any batching of updates that require layouting (note that with other type of updates batching isn't really necessary)

kmagiera commented 1 year ago

The below diff does a relatively simple optimization that changes that example to use transforms instead of layout props:

diff --git a/FabricExample/src/BokehExample.tsx b/FabricExample/src/BokehExample.tsx
index b544f65cc..18d2385f0 100644
--- a/FabricExample/src/BokehExample.tsx
+++ b/FabricExample/src/BokehExample.tsx
@@ -51,19 +51,28 @@ function Circle() {
     return () => clearInterval(id);
   });

+  const size = 100 + power * 250;
+
   const animatedStyle = useAnimatedStyle(() => {
-    const size = 100 + power * 250;
     return {
       backgroundColor: `hsl(${hue.value}, 100%, 50%)`,
-      width: size,
-      height: size,
-      top: top.value - size / 2,
-      left: left.value - size / 2,
+      transform: [
+        { translateY: top.value - size / 2 },
+        { translateX: left.value - size / 2 },
+      ],
       opacity: 0.1 + (1 - power) * 0.1,
     };
   }, []);

-  return <Animated.View style={[styles.bokeh, animatedStyle]} />;
+  return (
+    <Animated.View
+      style={[
+        styles.bokeh,
+        { width: size, height: size, top: 0, left: 0 },
+        animatedStyle,
+      ]}
+    />
+  );
 }

 interface BokehProps {

sergeymorkovkin commented 1 year ago

@kmagiera Kindly find my replies below.

Can you confirm the crashes still happen after https://github.com/software-mansion/react-native-reanimated/pull/3852

Cannot confirm your patch is working. There is one thing regarding this. Looks like calling requestanimationFrame doesn't call our new function and we need to call it like global.requestAnimationFrame. Looks like monkey-patching requestAnimationFrame doesn't fix all captured references to an original JSI-based requestAnimationFrame. Have you checked this? I'll still try to make it work, though.

The below diff does a relatively simple optimization that changes that example to use transforms instead of layout props.

I've tried this here. Rendering performance is not an issue. Reanimated is. We should optimize better.

It is important for us to know what type of devices can run at what framerate.

I've got some information for OPPO A16. These are the measurements of how long updateProps executes on average (in ms) + FPS. As you can see, I've done 5 measurements in a row after a fresh restart for every build type.

NewArch + Fabric + Debug = 2.14 2.05 2.35 2.20 2.35 (9.0 FPS, 1..6)
NewArch + Fabric + Release = 0.49 0.35 0.43 0.43 0.44 (50+ FPS, 0..2)
Release is (2.14+2.05+2.35+2.20+2.35)/(0.49+0.35+0.43+0.43+0.44) = 5.19 times faster
OldArch + Debug = 2.72 2.75 2.28 2.33 2.34 (15.4 FPS, 1..5)
OldArch + Release = 0.48 0.43 0.40 0.42 0.41 (50 FPS, 0..2)
Release is (2.72+2.75+2.28+2.33+2.34)/(0.48+0.43+0.40+0.42+0.41) = 5.80 times faster

All these measurements were done with 25 elements updating translateX/Y only.

kmagiera commented 1 year ago

Regarding the crash we've identified the actual root cause and the fix is here https://github.com/software-mansion/react-native-reanimated/pull/3859

kmagiera commented 1 year ago

As for monkey patching rAF I did check this but forgot to mention that it requires https://github.com/software-mansion/react-native-reanimated/pull/3838 to work that was merged recently and you perhaps don't have that patch included

sergeymorkovkin commented 1 year ago

Confirm it's no longer crashing. But batching requestAnimationFrame didn't help to increase FPS.

sergeymorkovkin commented 1 year ago

@kmagiera Do you think batching JSI calls to updateProps could be efficient? Or maybe it's about data conversion?

tomekzaw commented 1 year ago

I think we could batch all prop updates in JS and call JSI-based function _updateProps once (analogously to the requestAnimationFrame optimization). This would eliminate JSI overhead (between JS and C++).

Another idea for performance improvement on Android might be doing exactly the same but in the native code. When non-layout props like color or opacity are updated, we synchronously call mUIImplementation.synchronouslyUpdateViewOnUIThread on Paper or fabricUIManager.synchronouslyUpdateViewOnUIThread on Fabric, separately for each component. Note that each of this call originates from the C++ code so it uses JNI which might be the bottleneck.

Good news is that on Fabric we already apply all ShadowTree updates at once (done in #3369) in performOperations which is called once per frame (thus layout is calculated also once per frame), and I think the same guarantee holds for Paper as well.

At this point I think it might consider adding SystraceSection in our codebase so that we can use Android Profiler to see what's going on.

sergeymorkovkin commented 1 year ago

@tomekzaw Can we discuss this on a Zoom call, possibly?

kmagiera commented 1 year ago

I'm not sure about batching updateProps calls, this isn't as straightforward as with rAF calls, where we can save a number of JSI roundtrips. With updateProps we still need to pass the additional data so we'd have fewer calls but still more data to be unmarshalled using JSI. I wouldn't consider adding this unless we have a clear indication that it improves performance.

sergeymorkovkin commented 1 year ago

With updateProps we still need to pass the additional data so we'd have fewer calls but still more data to be unmarshalled using JSI.

Same concern here. Our gains might be insignificant, since marshaling will have to process roughly same amount of data. I'll try this anyways. Keep digging.

tomekzaw commented 1 year ago

Hey @sergeymorkovkin, there's one more thing which @kmagiera has reminded me of. When @j-piasecki was profiling flash-list on Android, he noticed that the performance depends on the NDK version used. In particular, Hermes built from sources with NDK 24 is way slower than the one build using NDK 21. If you use RN 0.70 app, NDK 24 is used by default on M1 Macs, see here. I know this is not really related to poor performance of updateProps but just wanted to let you know.

Edit: I added some systraces and here are the first results. I'm using a Samsung M23 (not really low-end), app in release build, using profileable instead of debuggable, 100% battery, not plugged in. Using BokehExample with 100 views with the transforms patch applied (i.e. no layout props). First, onAnimationFrame takes ~24 ms which is unacceptable. Light green represents NodesManager#updateProps, light blue below is mUIImplementation.synchronouslyUpdateViewOnUIThread. We suspect that the empty room between each updateProps is calling the useAnimatedStyle worklet itself.

tomekzaw commented 1 year ago

Here's my update on profiling Reanimated on Android. I added some more systraces in Java, C++ and JS code (as JSI binding). First, we noticed that one chunk of code related to color processing was left after shareable rewrite is no longer necessary, see #3872. Further ideas include optimizing JSI/JNI conversions and prop filtering (JNIHelper, init, forLoop) and perhaps batching prop updates on JS side.

sergeymorkovkin commented 1 year ago

I added some more systraces in Java, C++ and JS code (as JSI binding)

@tomekzaw Can you please confirm you're purely using Java-based Systrace and exposing same to both C++ and JS?

tomekzaw commented 1 year ago

Can you please confirm you're purely using Java-based Systrace and exposing same to both C++ and JS?

@sergeymorkovkin Here's how I added systraces:

Java

import com.facebook.systrace.Systrace;

Systrace.beginSection(Systrace.TRACE_TAG_REACT_JAVA_BRIDGE, "performOperations");
// ...
Systrace.endSection(Systrace.TRACE_TAG_REACT_JAVA_BRIDGE);

C++

First I've filled the implementation of DummySystraceSection in node_modules/react-native/ReactCommon/react/renderer/debug/SystraceSection.h:

#include <android/trace.h>

struct DummySystraceSection {
 public:
  template <typename... ConvertsToStringPiece>
  explicit DummySystraceSection(
      const char *name,
      ConvertsToStringPiece &&...args) {
    ATrace_beginSection(name);
  }

  ~DummySystraceSection() {
    ATrace_endSection();
  }
};
using SystraceSection = DummySystraceSection;

Since ATrace_beginSection and ATrace_endSection were introduced in API 23, I also needed to bump minSdkVersion to 23 (both in android/build.gradle and Example/android/build.gradle)

They are meant to be used in RAII-style:

{
  SystraceSection s("JNIHelper::ConvertToPropsMap");
  // ...
} // ~DummySystraceSection() is automatically called when the scope ends

JavaScript

In RuntimeDecorator::decorateUIRuntime I've implemented two JSI bindings:

auto clb10 = [](
              jsi::Runtime &rt,
              const jsi::Value &thisValue,
              const jsi::Value *args,
              const size_t count) -> jsi::Value {
  ATrace_beginSection(args[0].asString(rt).utf8(rt).c_str());
  return jsi::Value::undefined();
};
jsi::Value fun10 = jsi::Function::createFromHostFunction(
    rt, jsi::PropNameID::forAscii(rt, "_beginSection"), 1, clb10);
rt.global().setProperty(rt, "_beginSection", fun10);

auto clb11 = [](
              jsi::Runtime &rt,
              const jsi::Value &thisValue,
              const jsi::Value *args,
              const size_t count) -> jsi::Value {
  ATrace_endSection();
  return jsi::Value::undefined();
};
jsi::Value fun11 = jsi::Function::createFromHostFunction(
    rt, jsi::PropNameID::forAscii(rt, "_endSection"), 0, clb11);
rt.global().setProperty(rt, "_endSection", fun11);

I've also needed to whitelist these two names in plugin.js so these functions work properly in worklets:

const globals = new Set([
  // ...
  '_beginSection',
  '_endSection',
]);

Because some of the worklets are also run on the main JS context, here's how to use them:

if (_WORKLET) _beginSection('UpdateProps processColor');
// ...
if (_WORKLET) _endSection();

I did some initial benchmarks and I think I have one idea how to optimize execution time of single _updateProps call for non-layout-only props as well as a mix of layout and non-layout props without JS props.

tomekzaw commented 1 year ago

I've pushed my changes to @tomekzaw/android-systrace. It looks like 2b54074f1e5d8673f6ded19e9745981dfe4ca180 optimizes the execution time of a single styleUpdater from ~250 μs down to ~150 μs. cc @sergeymorkovkin

tomekzaw commented 1 year ago

I'm back with the numbers. TL;DR updateProps is 1.5-2x faster on 2b54074f1e5d8673f6ded19e9745981dfe4ca180 (tested on Samsung M23 as described above, 200 circles in BokehExample)

Version	Test case	`styleUpdater` avg	`animationFrame` avg	`styleUpdater` speedup	`animationFrame` speedup
before	only UI props	270 μs	61,97 ms	-	-
after	only UI props	147 μs	34,68 ms	1.84x	1.79x
before	UI & native props	296 μs	67,97 ms	-
after	UI & native props	181 μs	46,31 ms	1.64x	1.47x

Note: I'm still using the old terminology for consistency with the code, "UI props" means non-layout props while "native props" are props that require layout recalculation when changed.

sergeymorkovkin commented 1 year ago

@tomekzaw I'll try to replicate these results on my device. Will also look into weird mappers behavior with react-navigation.

tomekzaw commented 1 year ago

@sergeymorkovkin The number of requestAnimationFrame calls is 3x the number of circles in BokehExample because there are three independent timing animations running. This is the expected behaviour.

  const update = () => {
    left.value = withTiming(left.value + getRandomPositionDiff(), config);
    top.value = withTiming(top.value + getRandomPositionDiff(), config);
    hue.value = withTiming(hue.value + getRandomHueDiff(), config);
  };

When it comes to the performance of updateProps, one problem with using translateX/translateY over left/top that we found with @kmagiera is the way how React Native handles transforms on Android. First, React Native converts the list of transforms (in this case only transformX and transformY) into a transform matrix, which is then decomposed into single values of x-axis translation, y-axis translation, rotation etc. (see here). We are thinking of exposing additional props like translateX which directly call view.setTranslationX and avoid expensive calculations.

Finally, I couldn't find a place in our code where we explicitly invoke layout recalculation after we update layout props in performOperations. It turns out that this is done in mUIImplementation.dispatchViewUpdates (see here). We are calling this function conditionally because if React Native is currently processing a batch (i.e. queue isn't empty) then it will call dispatchViewUpdates on its own and Reanimated shouldn't do it. However, if there is no batch being processed, we need to call dispatchViewUpdates with an artificial batch id of -1.

sergeymorkovkin commented 1 year ago

@tomekzaw

You're totally right, I confirm your observation. The thing is in my other example I'm not using style-nested animations, and that's why I've missed it completely.

As for the transformation matrices, I suppose, those should normally be sent to the GPU, right? Why RN does this double calculation, is not very clear to me. It's clearly calling MatrixMathHelper.decomposeMatrix(). Looks like the bottleneck we were looking for.

As for calling dispatchViewUpdates(), I suppose, it's good, right?

tomekzaw commented 1 year ago

I suppose, those should normally be sent to the GPU, right?

Well, currently React Native runs these matrix calculations on the CPU because it needs to call setTranslationX, setScaleX, setRotationX etc. individually. I think that Android uses GPU at some later stage when drawing these views onto the screen. (I also found some Android API for composing transforms here but it's under android.view.animation so not sure if RN could use it instead of setTranslationX etc.)

Looks like the bottleneck we were looking for.

Yeah, synchronouslyUpdateViewOnUIThread itself takes almost half of the execution time of the whole _updateProps so this is totally worth optimizing. I think we can implement another variant of _updateUiPropsPaper specifically for View component that supports simple props like translateX, scaleX, rotateX individually and directly, i.e. without transform composition. One thing to remember is that other third-party libraries (such as react-native-svg) might whitelist its own non-layout props (like fill) so we still need to support those.

As for calling dispatchViewUpdates(), I suppose, it's good, right?

Yes, it's correct. The only problem with this chunk of code is that sometimes, in a specific scenario, it can deadlock, see #3879, but this is another story.

sergeymorkovkin commented 1 year ago

My another observation is that working with WritableMap's inside updateProps is also expensive. Possibly, this is why your approach of splitting it into three separate methods works. Let me try it...

tomekzaw commented 1 year ago

Yeah, basically ReadableMap and WritableMap are implemented in C++ so each read or write is a separate JNI call. Also, it's expensive to allocate them, so that's exactly what I'm trying to avoid.

Edit: I think that the average execution time of _updateProps also depends on the number of animated circles, so please keep this constant throughout the experiments. This might be also related to the slow warm-up issue you've mentioned on a call.

sergeymorkovkin commented 1 year ago

@tomekzaw Confirm that it works notably better with splitting updateProps on the JS side. It's animating 100 circles at roughly 45-50 FPS. We're close to resolving it. My further steps are:

Bypass transform matrices where possible (I plan implementing this in a separate method)
Debug/improve redundant reactivity in mappers when used with react-navigation
Implement batching updateProps on a per-frame basis
Combine everything via a custom useAnimatedStyle() supporting multiple elements

Also, there is a reason they calculate transformation matrix. The thing is it's allowed to perform same transformation many times, e.g.:

transform: [ {translateX: 10}, {translateX: -5}, {translateX: 15}, ]

tomekzaw commented 1 year ago

Confirm that it works notably better with splitting updateProps on the JS side. It's animating 100 circles at roughly 45-50 FPS. We're close to resolving it.

That's great! I will submit a PR with 2b54074f1e5d8673f6ded19e9745981dfe4ca180 then. These changes also need to be adapted for iOS but the speedup probably won't be this significant.

My further steps are:

Bypass transform matrices where possible (I plan implementing this in a separate method)

I was actually thinking of optimizing transforms on my own but honestly I don't have a clear idea yet on how to handle this (should we introduce another group of props, like let's say "direct props"?) so feel free to share your ideas. For the reference, here's a complete list of all non-layout props. I noticed that the list already contains translateX, translateY, scaleX and scaleY so it might be worth checking to use these like regular props instead of nesting them inside translate prop.

Debug/improve redundant reactivity in mappers when used with react-navigation

Implement batching updateProps on a per-frame basis

Combine everything via a custom useAnimatedStyle() supporting multiple elements

These are all valid ideas for further research. In particular, it would be great if you could debug mappers when used with react-navigation or even come up with some test cases that we can use for unit tests for mappers.ts.

The thing is it's allowed to perform same transformation many times

Yes also the order of transforms matters (i.e. translate and then rotate gives another transformation matrix than if rotation is first).

sergeymorkovkin commented 1 year ago

@tomekzaw You can surely submit a PR, but did you think about batching calls to updateProps? Now we need to batch 3 of them separately. I'd possibly wait. What do you think?

I was actually thinking of optimizing transforms on my own but honestly I don't have a clear idea yet on how to handle this

What we could do is allow passing an object for a transform prop:

transform: { translateX: 123, scaleY: 1.2, }

Of course, in such a case developer code will need to work around specifying multiple same-type transforms. Given this feature is more a convenience mechanism, I think this might be a fair compromise (to avoid matrix calculation). Still, the order of transforms is a concern...

tomekzaw commented 1 year ago

@sergeymorkovkin After some time spent on profiling, at this point I no longer think that batching _updateProps will speed things up significantly. Obviously, we could save some time on JSI calls but there would be more JNI calls. We know that 2b54074f1e5d8673f6ded19e9745981dfe4ca180 gives us ~1.6x speedup and we can save another ~40% on optimizing transforms. I think we can try out batching prop updates at some later stage.

Still, the order of transforms is a concern...

Yeah, first I would check if using transformX and scaleX props in top-level object works at least on Android.

sergeymorkovkin commented 1 year ago

@tomekzaw I've tested with direct props and we only get 10% of an improvement, not 40%, unfortunately. Looks, like all the CPU losses we experience are coming from JSI/JNI. And yes, I'm using NDK 21.4.7075529, not 24.

Using transform prop:

Using translateX/Y props:

At least, now we know how profitable avoiding matrix calculation is.

sergeymorkovkin commented 1 year ago

@tomekzaw Looks like our memory is leaking like crazy. Will look into this tomorrow. It's too early for a PR.

https://user-images.githubusercontent.com/1918742/208756897-efc24cb7-bdd6-4c9d-bc68-8bdf80c409c5.mov

Also, after observing an app for ~10 minutes, we start seeing stutters and frame drops again.

tomekzaw commented 1 year ago

Looks like our memory is leaking like crazy.

Might be, but eventually memory usage substantially drops down. For the record, there were some memory leaks in the old implementation of shared values but after shareable rewrite they all should be gone. It all depends on when GC decides to kick in. Note that you can trigger GC maunally by calling gc(); function on both runtimes (although we don't recommend doing so on each frame). Also, by default Hermes only cleans up the last generation of objects. So theoretically the app can consume as much memory as possible until the OS starts to complain.

Also, after observing an app for ~10 minutes, we start seeing stutters and frame drops again.

There might be some problem with long-running animations (which I also observed on BokehExample left running while I was making tea) but @piaskowyk is doing his best fixing it.

sergeymorkovkin commented 1 year ago

@tomekzaw There is no clarity on what exactly is leaking (JS? Java?). GC is running every 10 seconds, but not clearing much. Likely, this is happening because of...

... by default Hermes only cleans up the last generation of objects.

App starts at around 72 Mb RAM and gradually reaches 512 Mb and even more. Yes, there is a significant GC drop in the end of this video I attached. The problem is we do not control this and this is not guaranteed.

As for stutters and frame drops after certain period, this might or might not be related to GC.

There might be some problem with long-running animations (which I also observed on BokehExample left running while I was making tea) but @piaskowyk is doing his best fixing it.

Can you please provide a reference? Is there an issue?

tomekzaw commented 1 year ago

Can you please provide a reference? Is there an issue?

I don't have any code pointers or commits for you but @piaskowyk says that it might be related to the GC.

As for stutters and frame drops after certain period, this might or might not be related to GC.

Exactly. Can you please check if calling gc(); on both runtimes every 10 or 15 seconds resolves the issue with lagging long-running animations? We suspect these two issues might be related. We are also considering adding a system event listener that triggers GC on each memory warning (React Native already does it).

sergeymorkovkin commented 1 year ago

@piaskowyk @tomekzaw I think it'll be better to split this into a separate issue, so I created it here. I don't think this is related to GC, since triggering GC manually doesn't remove stutters. Also, if I trigger GC manually, it drops RAM consumption down to 300 Mb only, while application has started at around 70 Mb. Looks like we're not cleaning some lookup tables...

tomekzaw commented 1 year ago

Turns out that translateX and translateY props (as well as scaleX and scaleY) are still supported on Android (although they don't work on iOS), here's the example code:

App.tsx

```tsx import { StyleSheet, View } from 'react-native'; import React, { useEffect } from 'react'; import Animated, { useAnimatedStyle, useSharedValue, withRepeat, withTiming, } from 'react-native-reanimated'; export default function App() { const sv = useSharedValue(0); useEffect(() => { sv.value = 0; sv.value = withRepeat(withTiming(1), -1, true); }, []); const animatedStyle = useAnimatedStyle(() => { return { translateX: sv.value * 50, translateY: sv.value * 50, }; }); return ( ); } const styles = StyleSheet.create({ container: { flex: 1, alignItems: 'center', justifyContent: 'center', }, box: { width: 150, height: 150, backgroundColor: 'red', }, }); ```

It looks like MatrixMathHelper.decomposeMatrix is no longer called:

So as a workaround for your case, you can have two worklets (one returning an object with translateX and translateY props, for Android, and the other one with transform property, for iOS) and pass one of them to useAnimatedStyle depending on Platform.OS.

A more general solution would be to add support for translateX and translateY props on iOS. Luckily, this is pretty straight-forward and shouldn't degrade the performance too much.

sergeymorkovkin commented 1 year ago

@tomekzaw I've made this test a couple of days ago. Kindly find my message below:

@tomekzaw I've tested with direct props and we only get 10% of an improvement, not 40%, unfortunately. Looks, like all the CPU losses we experience are coming from JSI/JNI. And yes, I'm using NDK 21.4.7075529, not 24.

Using transform prop:

Using translateX/Y props:

At least, now we know how profitable avoiding matrix calculation is.

sergeymorkovkin commented 1 year ago

@tomekzaw I'm wondering: do we really need to use ReadableMap/WritableMap and depend on JSI? Why not simply use JavaOnlyMap, for instance?

As for this solution you're offering, unfortunately, we cannot replace this in all places. E.g. we also use element rotation and anchor point shifting. So, there are multiple trabslateXY used inside a transform prop.

tomekzaw commented 1 year ago

@sergeymorkovkin Here's the commit I talked about yesterday: a6a13678ec42cdc9af5b0cced8fc6970b9e94e35. It adds batching UI prop updates on the JS side and consequently optimizes the number of JNI calls thanks to using only a single ReadableMap for all component updates. Please check if it helps in your case (note that the patch is still a PoC so probably it needs some polishing before submitting a PR).

Thanks to batching, the average execution time of animationFrame went down from 20,74 ms to 14,22 ms (both measurements taken on OPPO A16, release mode + profileable, 50 bokeh circles, using translateX/translateY instead of transform, without JS systraces in useAnimatedStyle etc.).

Before:

After:

When it comes to MapBuffer, I still don't think this is a viable alternative. Obviously, it saves some CPU time and memory as it uses jni::JByteBuffer::wrapBytes to create a direct byte buffer without making a copy of the underlying data (see here). However, the JNI documentation states that:

It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system's native I/O operations.

In our case, the buffer is neither large nor long-lived (perhaps a better alternative would be to allocate a single DirectByteBuffer, re-use it between animation frames and use BSON instead). There are also some major limitations. First, MapBuffer only supports integer keys. React Native keeps a hard-coded list of constants that represent each property, see here. In our use-case, we need more flexibility. Finally, when we update props, we call methods from React Native that accepts ReadableMap/WritableMap, so even if we used MapBuffer, we would still need to convert it again to e.g. JavaOnlyMap which also takes some time.

With the batching optimization, the conversion itself takes only about 15% of the execution time which is approximately two times faster than what we've started with. I've discussed it with @lukmccall (who is an Android & JNI expert) and honestly at this point we don't think it's worth trying to optimize it further, especially taking into account that the other 50% of the execution time originates from the JS part of Reanimated (useAnimatedStyle updater, valueSetter step) and most likely this is what we should focus on in the first place. We also observed an increased activity of GC due to a high number of allocations and deallocations which also needs to be taken care of.

My next step is to profile the JS part of Reanimated using JSC or Hermes profiler. I really hope to find something interesting there that can be optimized.

tomekzaw commented 1 year ago

Here are the results of profiling Reanimated runtime using Hermes profiler via Chrome DevTools. Here are the results are from OPPO A16 running Example app in debug mode:

Thanks to this, I was able to identify the following bottlenecks:

_updatePropsPaper – as mentioned above, at this point I don't think this can be easily optimized further, so let's skip it as for now and focus on the JS-only code
isAnimated – this simple utility function takes almost 7% of the execution time, most likely because it involves array operations, this definitely needs to be investigated
GC Young Gen – it looks like we perform a lot of allocations and deallocations in a single frame, we need to find some way to re-use existing instances of shareables
arrayPrototypeIndexOf – in some places we use Array.indexOf where key in obj would also suffice, optimized in 6ce102d8bf839cdfdc51bd925fa99e53707f6275 which leads to ~5% speedup in debug mode (98 ms vs. 93 ms on average), however it shows no real improvement in release mode
styleDiff – still not sure what's the purpose of calculating difference
_makeShareableClone and _updateDataSynchronously – these two calls can be avoided by using one-way-reads-only shared values. However, I was strongly discouraged from using one-way-reads-only shared value at this stage since they have not been thoroughly tested yet as well as they are a subject to change in near future due to different ideas for optimizations (that's why the flag is not a part of the public API yet)

sergeymorkovkin commented 1 year ago

@tomekzaw styleDiff is required to avoid calling _updateProps when there are no actual changes.

tomekzaw commented 1 year ago

Hey @sergeymorkovkin, just out of curiosity I've implemented another optimization, see a13f33dd6e3dea8db11aef856db5a306d47d32d2. This time, along with batching prop updates and calling _updateProps only once, I'm also flattening the whole map of updated props. Previously, there was a separate ReadableMap of updated props for each component, for instance:

{
  "3": {
    "translateX": 100,
    "translateY": 200,
  },
  "5": {
    "translateX": 300,
    "translateY": 400,
  },
}

The optimization is to have just one map where the keys consist of the view tag and the prop name (like 3_translateX).

{
  "3_translateX": 100,
  "3_translateY": 200,
  "5_translateX": 300,
  "5_translateY": 400,
}

This significantly optimizes the number of JNI calls since previously there were 2 calls (importKeys and importValues called within getLocalMap in ReadableNativeMap.java) per each nested ReadableMap (this makes 2n+2) and after the change there's just 2 (regardless of the number of animated components).

Note that currently the PoC implementation only supports values of double type (adding support for integers or strings should be easy) and also doesn't optimize for nested fields (like transform) since we flatten only one level of the map (although I'm thinking on how to generalize this idea).

The results are quite promising, see the attached table. Also, the more animated components, the better speedup.	Implementation	Number of circles
Batching	50	13,7 ms
Batching + flattening	50	12,8 ms (~6.5% faster)
Batching	100	26,2 ms
Batching + flattening	100	24,3 ms (~7.3% faster)

Before:

After:

bolan9999 commented 1 year ago

Any update for this ?

tomekzaw commented 1 year ago

@bolan9999 Yep, first we need to finish #4503 which adds JSI-based batching and then we'll be able to go deeper with JNI-level batching to make Android faster.

bolan9999 commented 1 year ago

@bolan9999 Yep, first we need to finish #4503 which adds JSI-based batching and then we'll be able to go deeper with JNI-level batching to make Android faster.

Do you have a Benchmarks for this update totally? In my scene,I'm trying to rewrite a ScrollView and Recyclable list with Reanimated and GestureHandler. But when everything is done, the Benchmarks of it is a little bad. What's worse more is frame drops randomly on Android when scrolling 3 min or longer. But it works well when I stoped scrolling and wait for a few seconds.

tomekzaw commented 1 year ago

@bolan9999 Do you memoize gestures?

bolan9999 commented 1 year ago

@tomekzaw What do you mean memoize gestures?

it is experimental. And I will open source and take some Benchmarks latter. But muti-items animations with Reanimated causes frame drops is real. The only thing I can do is reducing the count of items and worklet calls.

tomekzaw commented 1 year ago

@bolan9999 If you use Gesture Handler v2 API (GestureDetector), you can memoize gestures by wrapping the Gesture objects inside useMemo hooks, e.g.:

const tap = useMemo(() => {
  return Gesture.Tap.onStart(/* ... */);
}, []);

<GestureDetector gesture={tap}>

bolan9999 commented 1 year ago

@tomekzaw It is not re-rendered in my scene at all. And I don't use GestureDetector cause of runOnJS will crash in worklets. I use useAnimatedGestureHandler instead.

bolan9999 commented 1 year ago

I'm pretty sure worklet calls or transform performance cause frame drops .

BalogunofAfrica commented 11 months ago

Getting the same issue animating transform using withRepeat, frame drops to as low as 10 fps on UI and JS side on Android (Pixel 4a), but runs at buttery smooth 60 fps on iOS.

  const itemSize = 560
  const offset = itemSize / 2
  const confettiTranslateY = useSharedValue(0);
  const imageTranslateY = useSharedValue(0);

  ...

  useEffect(() => {
    confettiTranslateY.value = withRepeat(
      withSpring(-50, {
        damping: 8,
      }),
      -1,
      true,
    );
    imageTranslateY.value = withRepeat(
      withTiming(-10 * itemSize + offset, {
        duration: 10_000,
        easing: Easing.inOut(Easing.exp),
      }),
      -1,
      true,
    );
  }, [])

...

  const confettiStyle = useAnimatedStyle(
    () => ({ transform: [{ translateY: confettiTranslateY.value }] }),
    [confettiTranslateY],
  );

  const imageStyle = useAnimatedStyle(
    () => ({ transform: [{ translateY: imageTranslateY.value }] }),
    [imageTranslateY],
  );

 ...

software-mansion / react-native-reanimated