Open danslobodan opened 4 months ago
@danslobodan Thanks for the info 😊 I don't know much about Unity, but this should be helpful to Unity users.
It seems to create significant overhead even on the PC. Would you perhaps consider sharing your intent with the MemoryMarshal implementation - perhaps we could come up with an alternative solution? I'd be happy to contribute.
The MemoryMarshal
implementation is definitely faster in my environment. Does this issue occur exclusively with IL2CPP? If so, adding an #ifdef
to revert to a simple for loop could be a solution.
This was previously mentioned by https://github.com/homy-game-studio/hgs-unity-tone/issues/4 as well.
I use meltysynth with Unity/IL2CPP with the following alteration:
public static void MultiplyAdd(float a, float[] x, float[] destination)
{
// use older implementation with il2cpp
// See also: https://github.com/homy-game-studio/hgs-unity-tone/issues/4
#if ENABLE_IL2CPP
for (var i = 0; i < destination.Length; i++)
{
destination[i] += a * x[i];
}
return;
#endif
var vx = MemoryMarshal.Cast<float, Vector<float>>(x);
...
@nickgal Thanks for the info 😊
@danslobodan
If you're curious about the speed of the MemoryMarshal.Cast
implementation, I have a benchmark result from when I updated the ArrayMath
to the current code. The new version speeds up the rendering process by 10% without increasing GC overhead.
https://github.com/sinshu/meltysynth-benchmark/commit/2d603e7dd30acd4ea95f34a8c8d4f2b0b6ec9278
@sinshu
On the PC, when running the game inside the Unity Editor, my results show
MemoryMarshal with Vector: Total Audio CPU: 28.0 % Simple for loop: Total Audio CPU: 18.8%
So it's a really significant difference. It's much, much worse on the Android, where it pretty much doesn't work at all.
What I saw from the call stack is that about all of the load comes down to the Garbage Collector. I can't say I understand what's happening behind the scene, but something is apparently being allocated and collected, despite seemingly being an allocation free operation.
We could dive deeper if you'd like.
Note that these are the results in Unity, on PC (not using IL2CPP) and Android (using IL2CPP). When not using IL2CPP on Android it actually runs better, but it still strugles.
Here's the call stack:
Vector: For loop:
So you can see that the load from the MultiplyAdd function almost vanishes when using for loop, while with the vector it accounts for most of the load.
Edit: Note that is on PC, not Android. On Android the difference is much more drastic than this.
Thanks for the detailed info 😊
To summarize the current situation:
MemoryMarshal.Cast
implementation is slow in Unity (even slower in IL2CPP).MemoryMarshal.Cast
implementation is fast in MS's .NET runtime.What I want:
MemoryMarshal.Cast
implementation, as it is faster for my use case.I've done a bit of research on Unity. I found that there are several ways to add libraries to Unity, not only by copying source code into the project but also by adding compiled DLLs or directly using NuGet packages (right?). This means that a simple #if
code switch at compile time might not be suitable.
The problem here is that I don't have the ability to handle Unity's processing system. Regarding the code changes intended for Unity, since I cannot verify their functionality, I should not add such changes to this repository.
For example, how about creating a fork optimized for Unity? I'm thinking of putting a link to that fork in a prominent position in the README for when Unity users discover this repository.
What do you think?
Do you think this change is a valid enough reason to fork? I'm not really experienced in the ways of open source, so I'd rather go with your opinion on the matter.
On the other hand, granted it's probably going to be hard to make an implementation that works well on both .net and Unity.
Hi and thanks for making this great tool!
I'm using meltysynth to do some real-time rendering of audio in Unity, through the OnAudioFilterRead callback function.
When I used the IL2CPP to compile the project for android and ran it on multiple devices, the audio was stuttering horribly.
Upon running some profiling sessions I found that almost all of the CPU load was actually from the Garbage Collector, specifically in this function:
Perhaps this is premature optimization, since changing the code to:
seems to have eliminated the load from this function entirely.
Perhaps this is not relevant to your intended use case, but I'd still like to let you know.