rikimaru0345 / Ceras

Universal binary serializer for a wide variety of scenarios https://discord.gg/FGaCX4c
MIT License
483 stars 53 forks source link

[Feature Request] LZ4 compression #44

Closed Hurri04 closed 5 years ago

Hurri04 commented 5 years ago

Hi,

I'm developing (as a hobby) a Tile-based 2D platformer game in Unity. At the moment I'm still creating some parts for a custom level editor. One of the more important issues in this project is data (de-)serialization:

I have a Level class which derives from ScriptableObject (in order to be able to edit certain values in the Unity Inspector). This class holds a Tile[,] array which gets serialized to a byte[] array / deserialized from it.

Currently I'm using neuecc's MessagePack serializer but if your benchmarks are still up to date I'd like to try out Ceras since I'm looking for maximum performance.

However, one thing Messagepack has is LZ4 compression. Since the byte[] array in my Level class needs to be serialized again using the default [SerializeField] attribute for the Unity serializer, reducing its size by applying LZ4 compression improves editor responsiveness immensely. Without compression I've even had the Unity editor freeze up/crash on me when the ScriptableObject of a large level is selected in the Project view (since apparently the Inspector deserializes the SO every frame to draw the fields, possibly applies changes and then serializes it again).

I saw that LZ4 is already on your list of planned features. Since I didn't find a schedule for these planned features my hope is that opening a feature request for this will help increase its priority ;)

My proposal would be to have an overload of your Serialize/Deserialize method with an optional parameter for the compression type. This way existing users can keep their syntax as is while adding something like "CompressionType.LZ4" as additional parameter applies the compression.

This would also allow changing it to "CompressionType.GZip" later (although GZip is not a priority for me since even with the smaller size resulting from it the compression would usually take longer than LZ4).

The default value could be "CompressionType.None" so that anyone who would need to could keep calling the same method overload and simply pass a variable with the appropriate CompressionType for their purposes.

If you need any more info let me know.

Cheers!

rikimaru0345 commented 5 years ago

Hi, should be relatively straightforward to add this.

My proposal would be to have an overload of your Serialize/Deserialize method with an optional parameter for the compression type. This way existing users can keep their syntax as is while adding something like "CompressionType.LZ4" as additional parameter applies the compression.

Hmm, normally I'd agree, but I think a normal setting would be better. Parameters can change, settings can not. That enables potential optimizations to be done because every feature/setting can rely on settings never changing (that has happened multiple times during development already) Also, adding more parameters/overloads makes using it slightly harder for beginners (even if optional), and with the recent addition of "Advanced.SerializeStatic()" I'd have to replicate every api change into yet another set of methods.

Currently I'm using neuecc's MessagePack serializer but if your benchmarks are still up to date I'd like to try out Ceras since I'm looking for maximum performance. Since I'm currently pretty busy, may I ask you to first test Ceras (with a custom compression layer on top if needed) to make sure that it even works for you at all? (in terms of features, speed, output size...)

I'd like to avoid spending (probably a considerable amount of) time if it doesn't help anyone in the end.

That being said, if I'll add it, then I'll probably implement the full feature all at once, so not just LZ4 but also GZip, and maybe also using ZLib directly to make use of sync points (which would be awesome for networking scenarios).

Hurri04 commented 5 years ago

Oh, of course, setting the variable for the compression in a Settings class would also work just fine.

I'll need to tie up a few loose ends in my code before I'll be able to replace the MessagePack serializer with Ceras but I'm pretty confident that it should work without any problems. If you'd like me to confirm this by testing it I might be able to do this within the next few days depending on my own schedule.

Since this is a hobby project switching out the serializer is not a time-critical factor for me. So if you are busy I completely understand if implementing compression might take a while. I should have some other parts in my code to which I could tend in the meantime ;)

rikimaru0345 commented 5 years ago

If you'd like me to confirm this by testing it I might be able to do this within the next few days depending on my own schedule. Yes, that'd be great. While doing preliminary testing you'll probably want to use some external LZ4 library (or maybe using messagepacks built in LZ4 encoder) to ensure that the total time of serialization+compression / deserialization+decompression is within your acceptable range.

If/When you tell me what you can see switching to Ceras would actually be beneficial for your project I'll make time to get this feature implemented asap. It will probably take just 1~2 days from then on.

Btw: there are a lot of settings and things that have a noticeable impact on Ceras' serialization performance. Most notably that's using the right overload (the one where you give a buffer by reference, and get the amount of bytes written as a return value; instead of the one where you only get the result-byte-array) and turning off reference-serialization (so Ceras works like almost every other serializer, like msgpack or newtonsoft json; namely not taking care of circular-references and de-duplication). There are a few more things that have a minor impact on performance - like using KnownTypes - but those should be obvious from reading the intellisense documentation (if you ever want to investigate that deeply).

If you have any questions or problems don't hesitate to ask! (open a separate issue, or join the discord, or whatever :P) If there are any issues, we can most likely fix them pretty easily. :+1:

Hurri04 commented 5 years ago

Okay, I've fixed a few things now in the last few days in preparation for testing.

I've downloaded Ceras from here: https://ci.appveyor.com/project/rikimaru0345/ceras/build/artifacts ("ceras_net45.zip" version), extracted it and put the contents into "Assets/Plugins/Ceras/" in my Unity project.

Then in my Level class I changed my "Save" method to look like this:

public void Save()
{
    SerializerConfig config = new SerializerConfig();
    config.DefaultTargets = TargetMember.AllFields;
    CerasSerializer serializer = new CerasSerializer(config);

    using(new StopwatchScope("Saving Level"))
    {
        int count = serializer.Serialize<Tile[,]>(tiles, ref serializedTiles);
        Debug.Log("serialized: " + count);
    }
}

Unfortunately calling this method results in this error message:

ArgumentException: Expression of type 'Tile[,]' cannot be used for parameter of type 'Tile[]' of method 'Void Serialize(Byte[] ByRef, Int32 ByRef, Tile[])' Parameter name: arg2

I think this is trying to indicate that a 2D array gets wrapped to a 1D array internally but there's some problem occuring?

rikimaru0345 commented 5 years ago

Hey, currently Ceras doesn't support multi-dimensional arrays, but I think I can easily add that :) I'll take a closer look at this this evening or tomorrow evening.

rikimaru0345 commented 5 years ago

@Hurri04 Can you take a look at the latest commit? 7a3573cdb4174f1ed543f5dbf1c272ce8a44e867

I've added support for multidimensional arrays. :+1: :smile:

By the way: the timing in your Save method will probably report a slower than expected time because most of the initialization happens in the first serialization. Ceras generates code for your types then (and they're only known once you actually use them in a Serialization/Deserialization). Skip the first measurement to get a more accurate result.

Hurri04 commented 5 years ago

I'm having a bit of trouble compiling Ceras myself.

After using [Build]->[Build Ceras] in Visual Studio I now have these files in this folder: "...\Workspace\C#\Ceras\src\Ceras\obj\Release\net45" Image

However, copying them into "Assets/Plugins/Ceras" folder in my project results in Unity showing this error message:

Assembly 'Assets/Plugins/Ceras/Ceras.dll' will not be loaded due to errors: Unable to resolve reference 'System.Buffers'. Is the assembly missing or incompatible with the current platform? Unable to resolve reference 'System.Runtime.CompilerServices.Unsafe'. Is the assembly missing or incompatible with the current platform?

rikimaru0345 commented 5 years ago

For unity, try using the output in: \bin\Release\netstandard2.0 I will try to add LZ4 compression in the next days and then publish a compiled version that you can download from appveyor as well :+1:

Not sure yet if I'll have compression as a core feature, or if I'll put it into a separate .dll like with the Unity extensions or Immutable Collections package.

Hurri04 commented 5 years ago

\bin\Release\netstandard2.0 contains these files: Image But using these results in the same error.

Are there any settings I'd need to set / packages I'd need to install for this to compile properly?

rikimaru0345 commented 5 years ago

Since Unity supports .net standard 2.0 it is strange that it has trouble loading Buffers and Unsafe.

Take a look at all the debug/release and different platforms, you should be able to find System.Buffers.dll and System.Runtime.CompilerServices.Unsafe.dll. Copy them to your Unity folder along with the Ceras.dll.

I think the issue Unity has is that those are packages that are downloaded through nuget. Visual Studio does it for you, automatically, but Unity does not respect the Ceras.deps.json, and doesn't download the needed packages. So that's why you need to do that step manually it seems.

From what I've heard Unity is aware of the issue and there are things planned to at least make this easier / more obvious in future Unity versions.

Hurri04 commented 5 years ago

I might be going slightly mad... Either I've been looking in the wrong folders yesterday or their contents have been modified since then.

\bin\Release\net45 now contains these files: Image and they work.

I've done a few test runs of saving and loading an (empty) Level off 700x400 Tiles (which is the biggest Level I expect to have, at least for the base game). Image These are the durations in seconds. 10 runs each, bottom numbers are averages. One tiny oversight I fixed was moving the Debug.Log call out of the StopwatchScope in order to not have this influence the measurement.

I changed the calls to the MessagePack API to use the normal "MessagePackSerializer" class instead of "LZ4MessagePackSerializer" which I used originally. This way the comparison with Ceras is more fair. The only settings I changed for Ceras were config.DefaultTargets = TargetMember.AllFields; and config.PreserveReferences = false;. In my experience (de-)compression algorithms only take a fraction of the time (de-)serialization algorithms take anyway so adding LZ4 on top should not have much impact for both.

In both cases I created a new instance of the respective serializer just before (de-)serializing. I'll later replace this by a getter property backed by a private field in my custom LevelEditorWindow class in order to reuse the instance. Can an instance of the CerasSerializer class be serialized by Unity's JsonUtility class btw? This would help when working in the editor since otherwise the backing field would still lose its value whenever play mode is entered or left or scripts are recompiled.

Hurri04 commented 5 years ago

Strike those first 2 sentences, I've definitely been looking in the wrong folder the first time. See obj instead of bin. Face -> desk :roll_eyes:

rikimaru0345 commented 5 years ago

Can an instance of the CerasSerializer class be serialized by Unity's JsonUtility class btw? This would help when working in the editor since otherwise the backing field would still lose its value whenever play mode is entered or left or scripts are recompiled.

I think I understand what you intend to do. You want to minimize the "startup time" as much as possible, so your iteration time is faster (not having to wait as long for your editor thing to load the game/level stuff), right?

Unfortunately there's really not much that can realistically be done there I think.

As for your actual question: There's huge amount of stuff cached inside a CerasSerializer instance, most of which is pretty complex (like static Type-caches, references to dynamically compiled code...) and would be really complicated to preserve/restore.

Anyway, when you enter/leave play mode or recompile scripts many of those things would be invalid anyway and restoring them from a serialization (if that were even doable easily) would not save much time.

For example you obviously can't "serialize" dynamic code, what you'd do is serialize the the construct/"source" from which it was created, and to deserialize it you'd deserialize this sort of description and then compile it again... All in all, even if someone were to put in the time to somehow make this happen, I highly doubt it would even be worth the effort.

What can be done, however, is further increasing the performance of the serialization/deserialization I think. But that depends on how exactly your tile object looks like. I assume it's a struct? Can you post the code for it? Also if your goal is performance I'd suggest you try normal nested arrays instead of multi-dimensional arrays as well. At least in the past multidimensional arrays always had quite a few peformance issues. (not 100% sure if that is still the case today).

If your tile object is a struct it could also be a good idea to think about the layout of the data members to ensure there is as little padding as possible (and ensuring that the internal memory layout will not change).

rikimaru0345 commented 5 years ago

Also let me know if you intend for your game to be cross-platform. Or at least if you intend to, load a save-game on a different platform than it was created. That could be relevant as well (because it would force you to not use any sort of reinterpret formatter, but that affects every serializer, not just Ceras).

Hurri04 commented 5 years ago

I see. Part of the reason for testing with a new instance each time was for this case; if the instance is lost too often it wouldn't make much sense to skip the first measurement, right? It shouldn't be too much of a problem though, since creating a new instance seems to be fast enough. And since it only happens when the play/edit mode changes or scripts are recompiled it's at least possible to keep the same instance around and (potentially) reuse it multiple times until then. Interestingly the deserialization time is a bit faster while in play mode where it goes down to ~0.3 seconds compared to the ~0.4 seconds in edit mode.

The Tile class contains these constants and fields:

#region Constants
public const int GraphicsLayersCount = 2;
private static readonly TileTypeComparer tileTypeComparer = new TileTypeComparer();
public static readonly Vector2 Scale = new Vector2(0.5f, 0.5f);
public static readonly Vector2Int PixelSize = new Vector2Int(20, 20);
private static readonly string[] vertexColorNames = { "_Color0", "_Color1", "_Color2", "_Color3" };
private static Mesh mesh;
private static Material typesMaterial;
#endregion

#region Fields
private GraphicsLayer[] graphicsLayers = new GraphicsLayer[GraphicsLayersCount];
private Color color = Color.white;
private Color[] vertexColors = new Color[4];
private HashSet<Type> types = new HashSet<Type>(tileTypeComparer);
#endregion

The TileTypeComparer class derives from IEqualityComparer and is used to prevent boxing when using the Type enum which is used to indicate which Tiles are walls, platforms, water, etc. The GraphicsLayer struct only contains these fields:

private int tilesetIndex;
private Vector2Int tilesetCoordinate;

This way each Tile contains 2 GraphicsLayers (background and foreground sprites). It also has a color and 0 to a few Types.

Instead of the 2D Tile array I've tried both using a 1D array with a wrapper property and nested arrays before but both were a bit cumbersome, e.g. when rezising the level.

I'll probably be able to remove the vertexColors array by putting it into the Level class itself. It's used to hold precalculated values from averaged colors of surrounding Tiles to a give a nice smooth look by interpolating these like vertex colors in a custom shader. This is also part of the reason why I'm using this custom Tile class instead of the one in Unity's Tilemap system.

Basically I'm porting an existing game which was written in C++ to Unity by rebuilding everything myself. I mainly do it for fun and learning in order to improve my coding and pick up a few new things along the way. I've thought about cross-platform before. While the original game was only available for pc it would certainly be rad to build an Android version to be able to play it anywhere. Although then I'd either need to add a custom touch input scheme (difficult due to the amount of buttons needed) or get myself a controller which can be connected to my smartphone :P

Btw: I noticed that the field into which the Tiles get serialized defaults to a size of 33554432 bytes. While I understand the reason for this normally is to prevent repeated allocations/GC runs, in case of my Level class the "serializedTiles" byte array needs to get serialized again using Unity's JsonUtility class where a smaller size would result in better performance. Would it be possible to instead use the List approach of doubling the size of the array when needed? Although I'm not sure how well this would integrate with compression when a larger array would be needed for serialization beforehand anyway. Could the CerasSerializer instance maybe reserve a byte array as buffer internally for this and after compression copy the result to a second buffer (the one passed by ref in the Serialize method)? Or should I just use the overload which returns a byte array instead?

rikimaru0345 commented 5 years ago

Not sure why you'd serialize to Json again. Can you explain?

For the byte array, you are supposed to use the return value (how many bytes are actually written). For example you'd use that to know how many bytes of the array to write to a binary file. But since you're doing some sort of double serialization with Json as a second step that won't really work. You will have to copy it into a new buffer that has exactly the right size (this is what the other overload does for you. But it prevents you from reusing the buffer then).

You can also customize Ceras' buffer behaviour.

BTW, if you can change your types set into just a singe bit flag enum, remove all managed types from your tile class, and then convert it to a struct, you will be able to gain some insane speed ups really easily. (like 10-100x, or even more... )

Because then ceras can use its reinterpret formatter on your tile array, which turns the serialization into what is essentially a cpblk call (aka memcpy).

rikimaru0345 commented 5 years ago

Ah, missed a question: Yes creating an instance of a ceras serializer is not that expensive, but the first time this new instance encounters a new type will be expensive. That means the first serialize or deserialize call (either one populates the caches for both cases) will be noticeable slower than all subsequent calls.

It might be possible to mitigate that a bit if you know exactly what types to expect. For example you could use the aot generator to generate a .cs file of the code that would get generated dynamically. So you'd save quite bit of time there (but every change to the targeted type would require you running the aot generator again as the code would be outdated of course). That can be done on a per type basis even.

Hurri04 commented 5 years ago

It doesn't necessarily have to be json. It's just that the [SerializeField] attribute which is used to automatically serialize the annotated field internally uses the JsonUtility class. Which is horribly slow and very limited in its functionality. Although it's also possible to additionally use the PreferBinarySerialization attribute on a class derived from ScriptableObject which will result in instances of the class being serialized to a binary format instead of json. But while this is a little bit faster it's still too slow for my purposes which is why I'm doing it manually using a third-party serializer (MessagePack / Ceras) with much better performance results.

My plan is to have all the data for one Level in a single file. For this the class derives from Unity's ScriptableObject class. The Level class then holds the deserialized but volatile Tiles[,] and the binary serializedTiles[] which is saved using the [SerializeField] attribute which this time is fast enough since there are no more custom classes which need to be handled, just a byte array.

The main performance increase however kicks in in combination with compression. This is simply due to the file sizes which have to be hauled around since Unity deserializes, possibly applies changes, and re-serializes a selected asset every single time the editor is repainted in order to draw the Inspector panel. Here's a comparison with file sizes from JsonUtility: Normal Json asset: 17.5MB (way too much) Normal binary asset: 4.7 MB (still too much) Zipped Json asset: 195 KB (much better) Zipped binary asset: 15.4 KB (best)

And from a bit of research I did LZ4 might be up to 10 times faster than GZip at which point it becomes more interesting despite the slightly larger output size.

Regarding turning the Tile class into a struct: The Tile.Type enum actually used to be a byte flag enum. At one point I changed it in order to be able to make it more extendable so that the LevelEditor and the game could be separated enough to make the LevelEditor reusable for other projects. However, lately I'm thinking of changing this back to how it was. Certain parts of the code simply don't allow for enough separation anymore so they are effectively linked together anyway.

Hurri04 commented 5 years ago

I gave it some more thought during the last few days and I think I might be able to change my Level class to not include the serializedTiles[] data (where it would get serialized again) but instead just save it in an extra file, e.g. as a TextAsset with the *.bytes extension which Unity handles as a special case to provide access via a bytes property.

Or I might use a custom file format, e.g. *.map and handle reading and writing manually depending on access speed and possible further functionality; one of my goals is to create the Level / LevelEditor system in such a way that users could potentially create their own maps and share them easily. Having a Scene file which contains a GameObject with a Level component which holds a reference to a Level.bytes data file might not be an optimal solution though either.

I'll need to do some more research and testing on AssetBundles in the next few days to learn whether/how I can create a new AssetBundle which includes a Scene and the data file and distribute it in a way that it can be downloaded at runtime later. The reason why having a Scene per Level would be nice is that I'd be able to keep the default system to place prefabs (e.g. enemies, items, etc.) and drag them around using the Transform component.

However, I'm a bit sceptical about the resulting file size of the Scene. It might be possible to reduce it drastically by setting up a custom system where only 1 Scene is used at all times and all enemies, items and such are just created at runtime from a serialized version which would be placed in the data file beforehand. This approach would however require a custom editor for prefabs placed in the Scene View to be serialized into the data file as well (possibly using custom classes to hold the data) and a custom editor to edit possible values in the Enemy script which would then not be a MonoBehaviour but instead a normal class... Obviously the first approach would be easier. Maybe the Scene file will be small enough. After all, Tiles vastly outnumber all other LevelObjects so it might be fine.

Would it be possible for the Ceras (de-)serialize methods to get an overload which takes a file path from which to read/ to which to write? Would it be possible/beneficial to use a Stream as well (either internally or provided explicitly) in order to be able to work with very large files where a normal `ReadAllBytes' would reach its limits?

rikimaru0345 commented 5 years ago

Would it be possible for the Ceras (de-)serialize methods to get an overload which takes a file path from which to read/ to which to write?

That's be just writing using (var fs = File.OpenRead("level.dat")) { ... } and then read the part you're interested in, right? I'm not sure this is something a serializer should do. Unless I'm misunderstanding (?)

Would it be possible/beneficial to use a Stream as well (either internally or provided explicitly) in order to be able to work with very large files where a normal `ReadAllBytes' would reach its limits?

That is a good question - giving a good answer to that requires a bit more context: Initially it might seem like a good idea because Streams make things easy / easier (depending on what you're doing). However with Stream you always end up with a call to Read(...) (in the deserialize case) which in the end also just reads into a byte array. (And Write using a byte array in the serialize case). That's why Ceras (and many other high performance serializers) do intentionally not provide a Stream based API. Because it only adds overhead. Serializing objects requires a lot of tiny reads/writes, so this would be pretty much the worst case scenario for performance.

To deal with large amounts of data that you can't load into memory all at once for any reason, you can (and should) split out individual segments.

With other serializers you'd manually split the object into parts (I think you already suggested something like that, right?). For example you would split your level data into object definitions (what objects there are, where they are and all their other values/properties) and then maybe your tile set into a separate part...

So you'd have two Serialize calls, one for the tiles, one for all the other objects (and maybe generic level/map information like time of day, skybox name, ...)

For more complicated scenarios Ceras also supports something called IExternalRootObject that allows you to automatically deconstruct an object graph into individual parts.

A perfect example for that would be an entire game database. I've written an example for that already to fully showcase the feature.

To give you a quick idea: Imagine you have a game that has many Unit definitions (Archer, Skeleton, Mage, Orc, Elv, ...) and many Spells and Items and whatnot.

And all those things refer to each other in various ways. For example a summoning-scroll item would have a reference to the specific Unit that it summons/creates when used.

Now when you want to edit anything you'd normally be forced to load everything (or almost everything) because all those objects have all sorts of references to each other, and you can't really load one without the other...

The idea now is that every "root object" (Unit, Spell, Item) has an ID, and you can serialize/deserialize those individually, without having to turn your direct references to IDs. For example in the summoning-scroll item you'd keep public Unit SummonedUnit; exactly as it is.

When serializing the item Ceras will write the ID of all the root objects instead of the object itself. That way every object (every unit, spell, and item) will result in its own little byte-array.

When loading one of those again, you can (if you want to) react to the "OnExternalObject " where Ceras gives you the ID and wants you to provide the object for it. (Or you can just return null for a partial load...)

All of this works recursively as well...

Take a look here: https://github.com/rikimaru0345/Ceras/blob/master/samples/LiveTesting/Tutorial.cs#L327

rikimaru0345 commented 5 years ago

I gave it some more thought during the last few days and I think I might be able to change my Level class to not include the serializedTiles[] data (where it would get serialized again) but instead just save it in an extra file, e.g. as a TextAsset with the .bytes extension which Unity handles as a special case to provide access via a bytes property. Or I might use a custom file format, e.g. .map and handle reading and writing manually depending on access speed and possible further functionality; one of my goals is to create the Level / LevelEditor system in such a way that users could potentially create their own maps and share them easily. Having a Scene file which contains a GameObject with a Level component which holds a reference to a Level.bytes data file might not be an optimal solution though either. I'll need to do some more research and testing on AssetBundles in the next few days to learn whether/how I can create a new AssetBundle which includes a Scene and the data file and distribute it in a way that it can be downloaded at runtime later. The reason why having a Scene per Level would be nice is that I'd be able to keep the default system to place prefabs (e.g. enemies, items, etc.) and drag them around using the Transform component. However, I'm a bit sceptical about the resulting file size of the Scene. It might be possible to reduce it drastically by setting up a custom system where only 1 Scene is used at all times and all enemies, items and such are just created at runtime from a serialized version which would be placed in the data file beforehand.

I think a custom file format is a good idea. One thing I've learned when dealing with Unity over the years, is that taking more control over your data is always good. Personally I only use Unitys built-in serialization mechanisms (scenes, scriptable objects, ...) for extremely simple stuff (if at all).

Whenever you want to maintain references to scene objects you'll very quickly run into issues. In my experience at least it's worth the effort of simply biting the bullet and doing everything manually.

The way you described it here is already the first (and biggest) step: ...setting up a custom system where only 1 Scene is used at all times and all enemies, items and such are just created at runtime from a serialized version which would be placed in the data file.... If you're already willing to do that, then the remaining steps of completely decoupling everything from Unity/Scenes is relatively straightforward.

This approach would however require a custom editor for prefabs placed in the Scene View to be serialized into the data file as well (possibly using custom classes to hold the data) and a custom editor to edit possible values in the Enemy script which would then not be a MonoBehaviour but instead a normal class... Obviously the first approach would be easier. Maybe the Scene file will be small enough. After all, Tiles vastly outnumber all other LevelObjects so it might be fine.

I get where you're going with this, but it sounds like a huge amount of work. Wouldn't you have to write a completely custom unity-inspector for almost every object then? Also making a custom inspector window that allows you to not only change simple stuff like strings and numbers, but also change references to other objects sounds pretty complicated...

But maybe all of this isn't needed. From what I understand what you're actually looking for is just a way to customize how things are saved (serialized). How about something like this then: Initially you just have normal GameObject with all sorts of MonoBehaviours in your Scene. Then, instead of saving the scene as usual, you'd have a custom script that collects all objects in the scene, and writes them to a custom serialized format. You could find each gameobject, from there each monobehaviour (just call GetComponents<Component>() to get literally every component on the gameobject), and from there you'd just call GetType() on each component to get the actual type, and once you have the type you can iterate through all its public fields and properties.

Then saving would be easy.

In its most simple form (just to illustrate my point) you could then just serialize everything as a Dictionary<string, object>. Maybe like this:

class MySerializedGameObject
{
   public string Name;
   public Matrix4x4 Transform;
   public List<MySerializedComponent> Components;
}

class MySerializedComponent
{
  public string ComponentTypeName;
  public Dictionary<string, object> FieldAndPropertyValues;
}

This will work for simple stuff, where components only contain simple types. But extending it so references from one GameObject to another are correctly maintained shouldn't be too hard either. (just add some sort of class MyGameObjectId : MonoBehaviour ... thing to every gameobject so you can refer to gameobjects...).

Actually... now that I think about it... I think Ceras can do most of the more complicated work for you. I haven't fully thought this through yet, but if someone were to write a custom IFormatter<> for UnityEngine.GameObject and UnityEngine.Component then all of this could probably be solved really easily and much quicker than I initially thought. Ceras would handle pretty much everything; mapping GameObjects and components to IDs and back, and also taking care of any other data types (even the ones that neither Unitys internal binary serializer, YAML/Json serializer supports).

You'd be completely side-stepping Unitys serialization. You wouldn't have to write any custom editor tools, everything stays GameObject+MonoBehaviour(Component). Only saving / loading would be custom.

And you could even mix things. Simply excluding certain GameObjects from getting serialized in your custom serialization would be very easy. You could create a scene/level with terrain and trees and whatever, and then have all the actual "entities" on the map (dynamic stuff like monsters, items, ...) serialized into the custom format...

Anyway, I'm not entirely sure if all of this would really work as I imagine it, but I think it'd be relatively little work to find out / make a prototype. A prototype will definitely reveal any edge-cases or things I didn't consider. Maybe I'll even try it myself when I have some time :P

one of my goals is to create the Level / LevelEditor system in such a way that users could potentially create their own maps and share them easily. Having a Scene file which contains a GameObject with a Level component which holds a reference to a Level.bytes data file might not be an optimal solution though either

At runtime? Like, having an ingame-level editor? Creating new assets like an 'AssetBundle' file at runtime is impossible anyway (so unless you roll your own thing, you'd definitely not be able to have a runtime-editor anyway). Same thing goes for 'Scene', you can create a new empty scene, but you can't save scenes at runtime.

The idea I outlined above would make that possible though. Or maybe I misunderstood and your plan is to have people download Unity and install some sort of editor-addons you made, so they can make levels for your game (?).

Hurri04 commented 5 years ago

That'd be just writing using (var fs = File.OpenRead("level.dat")) { ... } and then read the part you're interested in, right? I'm not sure this is something a serializer should do.

Yeah, you're probably right. Especially if a Stream API overload also doesn't provide any benefit. It seems saving the serialized data to a file comes at the trade-off of not being able to use a buffer then, at least for infrequent events such as loading the Tile data of a Level.

I've had a look at the example to split large reference structures. It's pretty cool tech but probably not needed for my project ;)

Instead I did a fair amount of refactoring in the last few days to optimize my types as you suggested. Hence the delayed reply, sorry about that. I wanted to see it in a working state though first in order to be able to make a new performance test: Image So saving now is 10+ times faster while loading is 23+ times faster which is pretty nice. And even with the usual editor overhead. At this point I think I might have to set an artificial minimum time for the loading screen :D

Something I'm looking out for though is the size of the struct. I've read that the recommended size is 16 bytes or only slightly more since at some point copying a struct would get more expensive than using a class object. Having ultra fast loading/saving speeds at the cost of reduced performance during runtime when objects get passed around all the time would not be ideal...

Currently the Tile struct holds a GraphicsLayerCollection struct (in order to not use a List or Array since those would introduce references into the struct, leading to it being placed in the heap instead of the stack - or am I mistaken?) which uses an indexer property to grant access to the 2 GraphicsLayer structs it contains which in turn have an int for the Material index and a Vector2Int for the coordinate. The Tile also has a Color and a Tile.Type flags enum (instead of a Hashset now). This results in 2 (1 + 2) 4[number of bytes for an int] + 4 + 1 = 29 bytes which might be a bit much, no? From my understanding passing Tiles via ref keyword would mean that they wouldn't have to be copied but I'm not sure yet whether this will be possible everywhere in my code due to the way I access them at certain points.

One more thing I did was changing the 2D Tile[,] array to a jagged Tile[][] array which should also help with performance. Since changing Tile from a class to a struct broke my Size() extension method which worked on object[,] arrays I figured I might as well. Still, not having this extension method is a bit of a PITA since I've had to replace quite a few lines by writing new Vector2Int(Tiles.Length, Tiles[0].Length) manually instead of just calling this:

public static Vector2Int Size(this object[,] array)
{
    return new Vector2Int(array.GetLength(0), array.GetLength(1));
}

I also had a look at the resulting file size: Using a test Level with 650x256 Tiles results in 6.98MB whereas 650x256 x29 would only result in 4.6MB. I guess the diffence comes from additional type data? Zipping the file results in a size of 255KB which is larger than the 15.4KB from the zipped binary file I was testing before. But then again that particular map was just 700x400 empty Tiles whereas this 650x256 map actually contains a real level with varying values in the GraphicsLayers. I'm guesing this results in less-compressable data because certain values/blocks aren't repeated endlessly anymore?

I think a custom file format is a good idea.

I tested saving as .bytes and .map which both resulted in the same file size. So I'll probably test whether using the bytes property Unity provides for the .bytes file is better or worse than loading the .map file myself, performance-wise.

Whenever you want to maintain references to scene objects you'll very quickly run into issues.

Yep, I noticed this when I tried to turn my Level class from a ScriptableObject into a MonoBehaviour in order to see whether I could just save it in a scene which I would then load additively to the main scene which contains the game manager class. However, this resulted lost references and the editor not even allowing me to reassign them (at least without using EditorSceneManager.preventCrossSceneReferences and some lazy initialization properties in order to find the references at runtime...). It puzzles me a bit that Unity doesn't allow extending the Scene class in order to save additional custom data in a "global" sense. Instead you always have to place such data in a MonoBehaviour component on a GameObject for which you then have to search and hope it exists. Which doesnt exactly look like type-safety...

The way you described it here is already the first (and biggest) step: [...] If you're already willing to do that, then the remaining steps of completely decoupling everything from Unity/Scenes is relatively straightforward.

To be honest, I'd like to try and avoid piling extra work on top of everything that's already still ahead of me (a custom sprite animation system, a custom physics system with a custom collision system, a player controller, porting all enemy behaviors, ...) :P While my goal is more to write good code instead of getting done faster (so I don't need to search for a new hobby again soon ;)) I'd like to see some progress at some point.

How about something like this then: [...] Maybe like this: [...]

This is pretty much what I had in mind, yes. There are some additional values to save for GameObjects (parent-reference, name, tag, layer) but apart from that it seems good.

Actually... now that I think about it... I think Ceras can do most of the more complicated work for you. I haven't fully thought this through yet, but if someone were to write a custom IFormatter<> for UnityEngine.GameObject and UnityEngine.Component then all of this could probably be solved really easily and much quicker than I initially thought. Ceras would handle pretty much everything; mapping GameObjects and components to IDs and back, and also taking care of any other data types (even the ones that neither Unitys internal binary serializer, YAML/Json serializer supports). You'd be completely side-stepping Unitys serialization. You wouldn't have to write any custom editor tools, everything stays GameObject+MonoBehaviour(Component). Only saving / loading would be custom.

This sounds absolutely amazing to the point where you should contact the Unity devs to hire you to throw out their old serializer and replace it with Ceras :D Apparently, a lot of stuff (especially editor-side) was built on top of the old serializer over the years and either there's noone left who knows how to improve it or they don't dare touch it anymore because everything is held together by duct tape and prayers :P

One thing I'm not sure about however is deserialization; while it's possible to call new GameObject(), calling AddComponent<T>() would then just add a new instance with default values. Would it be possible to overwrite those values even if they're not accessible (private/protected)?

At runtime? Like, having an ingame-level editor?

Nah, nothing that fancy. Building the EditorWindow for my LevelEditor is enough work without having to think about getting actual graphics for an ingame UI and creating an even more simplified layout than I already have in order for it to be usable with any ingame controls.

Or maybe I misunderstood and your plan is to have people download Unity and install some sort of editor-addons you made, so they can make levels for your game (?).

This would work better for me. I guess by "create their own maps and share them easily" I meant relatively easy ;) But I think it should be possible to set up a system to check an online resource (e.g. a file on github) to get a list of verified maps/map packs (name, version, download link, hash code for security check, etc.). This way map creators could upload their map and create a pull request on my master file which I could then check and approve. Players would then see an ingame menu with a button to check for custom maps which downloads the master file and displays the names of the available maps with a download button next to them (or a play button if it's already downloaded). Maybe with an option to load a map from a local folder for people without local internet connection and who are willing to take the risk of installing maps with penis drawings as level design :P

rikimaru0345 commented 5 years ago

Yeah, you're probably right. Especially if a Stream API overload also doesn't provide any benefit. It seems saving the serialized data to a file comes at the trade-off of not being able to use a buffer then, at least for infrequent events such as loading the Tile data of a Level.

If it ever becomes a problem that can be fixed relatively easily as well.

Btw, I was experimenting with all sorts of compression stuff. Like how various compression libraries (like zlib, or LZ4) could be integrated with Ceras. It turns out that there isn't anything that could be done that would make a big difference.

After you're done with the serialization, you'd just do something like LZ4Codec.Encode(source, target);, or use a LZ4CompressionStream and then write to that if you want to (doesn't make much of a difference either way)...

It's pretty much the same thing as with writing to files / reading from them. Integrating it with the serializer doesn't offer any potential benefits.

So saving now is 10+ times faster while loading is 23+ times faster which is pretty nice. And even with the usual editor overhead. At this point I think I might have to set an artificial minimum time for the loading screen :D

Looks great! Those are some insane improvements!

Something I'm looking out for though is the size of the struct. I've read that the recommended size is 16 bytes or only slightly more since at some point copying a struct would get more expensive than using a class object. Having ultra fast loading/saving speeds at the cost of reduced performance during runtime when objects get passed around all the time would not be ideal...

My advice: don't waste any time thinking about the struct size at all. That microsoft article (pretty sure that's where that idea is from) is pretty old and, by now, irrelevant. Just pass structs by reference and you'll be fine.

Currently the Tile struct holds a GraphicsLayerCollection struct (in order to not use a List or Array since those would introduce references into the struct, leading to it being placed in the heap instead of the stack - or am I mistaken?)

Yeah, basically the performance increase comes from the fact that Ceras detects that the struct is "blittable" (a word which has many conflicting definitions, there's not really a universally agreed upon definition for it) and that allows it to use its ReinterpretFormatter. I could go into great detail how all this works, but to make it short, there's only one thing to keep in mind: The target type must be a struct, and the struct must not contain any non-blittable types. Examples of non-blittable types are: string, any sort of array T[], anything that is defined as class, ... (the array part might be a bit confusing the content can be reinterpreted as long as the element-type is blittable, but the array object itself is not blittable. That's why having any sort of array inside a struct makes the whole struct non-blittable. If something is unclear don't hesitate to ask.)

This results in 2 (1 + 2) 4[number of bytes for an int] + 4 + 1 = 29 bytes which might be a bit much, no? From my understanding passing Tiles via ref keyword would mean that they wouldn't have to be copied but I'm not sure yet whether this will be possible everywhere in my code due to the way I access them at certain points.

The size doesn't matter at all, this is not the place to optimize. Just pass them by ref. Alternatively you can pass the tile array + the x,y coordinates as well (which is like passing by ref, just "manually" in a way :P)

due to the way I access them at certain points. We'd need a concrete/specific example here. It should always be possible.

One more thing I did was changing the 2D Tile[,] array to a jagged Tile[][] array which should also help with performance.

I tried to make a reinterpret formatter for multidimensional arrays work. But it turned out way too brittle. But I thought of some new ideas that could make it work. However implementing those will probably take quite a bit of time (that I don't really have :P). Also since you've already refactored to jagged arrays it's not really needed anymore anyway.

Since changing Tile from a class to a struct broke my Size() extension method which worked on object[,] arrays I figured I might as well.

Wouldn't the following work?

public static Vector2Int Size(this Tile[][] array)
{
    return new Vector2Int(array.Length, array[0].Length);
}

I also had a look at the resulting file size: Using a test Level with 650x256 Tiles results in 6.98MB whereas 650x256 x29 would only result in 4.6MB. I guess the diffence comes from additional type data?

I have no idea. I very much doubt that any type data would cause this though. Ceras is very efficient with its type encoding system. What does the "x29" mean?

I'm guesing this results in less-compressable data because certain values/blocks aren't repeated endlessly anymore?

Depends on the compression algorithm. But yeah, probably.

I tested saving as .bytes and .map which both resulted in the same file size.

The file extension doesn't do anything. I probably should have said "serialized format" instead of "file format". By using Ceras (or any serializer) you're already essentially using a custom format.

To be honest, I'd like to try and avoid piling extra work on top of everything that's already still ahead of me (a custom sprite animation system, a custom physics system with a custom collision system, a player controller, porting all enemy behaviors, ...) :P While my goal is more to write good code instead of getting done faster (so I don't need to search for a new hobby again soon ;)) I'd like to see some progress at some point.

Haha yeah I can totally understand that.

This sounds absolutely amazing to the point where you should contact the Unity devs to hire you to throw out their old serializer and replace it with Ceras :D Apparently, a lot of stuff (especially editor-side) was built on top of the old serializer over the years and either there's noone left who knows how to improve it or they don't dare touch it anymore because everything is held together by duct tape and prayers :P

I don't think it's that easy (for Unity at least), but I think you're right. It could definitely be something to replace GameObject/Scene serialization. But then again Unity is pushing their ECS stuff pretty heavily. One of its advantages is the ability to do pretty much what my ReinterpretFormatter does. So serialization/deserialization/save/load will be much faster for ECS stuff anyway.

One thing I'm not sure about however is deserialization; while it's possible to call new GameObject(), calling AddComponent() would then just add a new instance with default values. Would it be possible to overwrite those values even if they're not accessible (private/protected)?

Yeah, Ceras can overwrite members/values of existing instances, including private members, and even readonly fields! :D

Hurri04 commented 5 years ago

Btw, I was experimenting with all sorts of compression stuff. Like how various compression libraries (like zlib, or LZ4) could be integrated with Ceras. It turns out that there isn't anything that could be done that would make a big difference. After you're done with the serialization, you'd just do something like LZ4Codec.Encode(source, target);, or use a LZ4CompressionStream and then write to that if you want to (doesn't make much of a difference either way)... It's pretty much the same thing as with writing to files / reading from them. Integrating it with the serializer doesn't offer any potential benefits.

You mean in terms of further performance improvements when passing the data and using/not using Streams, right? Not in the sense that applying LZ4 does not result in a smaller byte array/file size? The latter would surprise me, seeing how e.g. a 7152 KB file containing a serialized Level can be packed into a 457 KB zip file (in the Windows Explorer).

Wouldn't the following work?

public static Vector2Int Size(this Tile[][] array) { return new Vector2Int(array.Length, array[0].Length); }

Yep, this works. It doesn't have the flexibility to be used on any object anymore, but oh well. I went through my code and changed the lines back where I used the explicit calculations each time. Although I'm thinking of wrapping the jagged array and these extention methods into a TileGrid class with a Vector2Int indexer property. This would allow me to change the extention methods to getter properties instead as well.

I have no idea. I very much doubt that any type data would cause this though. Ceras is very efficient with its type encoding system. What does the "x29" mean?

That's the 29 bytes per Tile.

The file extension doesn't do anything. I probably should have said "serialized format" instead of "file format". By using Ceras (or any serializer) you're already essentially using a custom format.

Yeah, this was just to test whether Unity would do anything with it after it recognized that there was a new file. One thing I still need to test is AssetBundles which can have no compression, LZMA or LZ4 compression applied when building them. I'm still getting a few compiler errors though which I'll need to fix in order to build one.

I don't think it's that easy (for Unity at least), but I think you're right. It could definitely be something to replace GameObject/Scene serialization. But then again Unity is pushing their ECS stuff pretty heavily. One of its advantages is the ability to do pretty much what my ReinterpretFormatter does. So serialization/deserialization/save/load will be much faster for ECS stuff anyway.

I'd need to have another look at ECS and what its current status is. However, from what I remember when I had a look at it in the past it didn't seem trivial to use (or at least not as trivial as Ceras) to the point where wanting to use it would require a serious justification (i.e. performance increase) in order to make it worth the hassle.

rikimaru0345 commented 5 years ago

You mean in terms of further performance improvements when passing the data and using/not using Streams, right? Not in the sense that applying LZ4 does not result in a smaller byte array/file size? The latter would surprise me, seeing how e.g. a 7152 KB file containing a serialized Level can be packed into a 457 KB zip file (in the Windows Explorer).

Yeah exactly :D Sorry for the confusion! I was just talking about how I'm having a hard time "combining" Ceras and LZ4 more tightly. It seems to me that the current approach (where you'd just call Serialize() and then pass the data to LZ4) is already as optimal as it gets. Like there's no overhead there that could fixed by a more "in-depth" integration between the two libraries.

I'd need to have another look at ECS and what its current status is. However, from what I remember when I had a look at it in the past it didn't seem trivial to use (or at least not as trivial as Ceras) to the point where wanting to use it would require a serious justification (i.e. performance increase) in order to make it worth the hassle.

Oh yes indeed, I completely agree! It's a lot more complicated than the classical way of programming for sure. Maybe it will get a little bit better in the future, but I don't expect any huge improvements in terms of usability / comfort.

Hurri04 commented 5 years ago

It seems to me that the current approach (where you'd just call Serialize() and then pass the data to LZ4) is already as optimal as it gets. Like there's no overhead there that could fixed by a more "in-depth" integration between the two libraries.

Would you still build these compression libraries into Ceras then, for comfort of use? Or would it make more sense to maybe just have some links here on github to lead to other repositories where the respective compression libraries can be downloaded, in order to keep Ceras a bit smaller? I'm not sure how much of a size difference it would make. And if I understand this correctly Unity will strip away unused libraries when building with IL2CPP?

rikimaru0345 commented 5 years ago

Ok so help me with one thing here please. I'm having a hard time coming up with any ideas of what to do with any compression libraries even.

Like, lets say I'd create a new nuget package named Ceras.LZ4CompressionAddon. What would that package even do?? :laughing: :sweat_smile:

I think an example could help in showing what I mean. Lets say we are serializing some data and then saving it to a file. But now we want to also add LZ4 compression ontop of that, right?

But then doing that would be just:

Before
```csharp byte[] buffer = null; int dataLength = ceras.Serialize(myLevel, ref buffer); using(var file = File.Create("someData.bin")) file.Write(buffer, 0, dataLength); ```
After
```csharp byte[] buffer = null; int dataLength = ceras.Serialize(myLevel, ref buffer); using(var file = File.Create("someData.bin")) using(var lz4Stream = LZ4Stream.Encode(file)) file.Write(lz4Stream , 0, dataLength); ```

The difference in the code is literally just a single line. And LZ4Stream would be available after two clicks by just installing K4os.Compression.LZ4.Streams from NuGet.

So now what I don't get is this: What could I possibly write in my little hypothetical Ceras.LZ4CompressionAddon to make that any easier? Or more "integrated"?

Would I just make a wrapper for LZ4Stream? But what for? My addon library would have a dependency on K4os.Compression.LZ4.Streams anyway, and most likely my wrapper wouldn't be as feature complete as using the original thing. So what's the point :joy:

Is there anything obvious I'm missing?

Don't get me wrong! If there's some comfort-of-use-thing I can do to make it easier, I'd consider it! But what is the difference between me adding some useless custom library/integration vs just using a compression library directly? I mean it's not like you'd save even a single key-stroke here, right?

Hurri04 commented 5 years ago

Hmm, I guess if this can be done in one line there's not much point in including it. It seems in neuecc's MessagePack serializer which I was using before this was a bit more work which is why he a added an extra class?

I should have some free time during the next few days so I could try adding the LZ4 library to my project. Would it be enough to just add these C# classes or do I need to download and build the project via NuGet?

rikimaru0345 commented 5 years ago

It seems in neuecc's MessagePack serializer which I was using before this was a bit more work which is why he a added an extra class?

I think explaining the real reason why that is neccesary for MessagePack-CSharp requires a bit of an explanation:

"MessagePack" is actually a format definition while and "MessagePack-CSharp" is an implementation of that format.

That means the MessagePack-CSharp library has to - in order to stay compatible to the standard - mark the compressed data as being an "extension" type (extension of the msgpack format). So other libraries know that the data that follows is not something defined in the standard, and they need to have special handling to understand it (or just error out, which is what actually happens in most cases).

So that code basically just first compresses the data, and then secondly embeds it marked as an "extension" block. (Also explained in the first lines of the file you linked)

Ceras on the other hand does not define a public format, resulting in a different set of advantages and disadvantages. Most importantly:


I should have some free time during the next few days so I could try adding the LZ4 library to my project. Would it be enough to just add these C# classes or ...

You can't copy just those classes you linked. They just define the stream API, but there's much more to LZ4 than that.

...do I need to download and build the project via NuGet?

You could either download K4os.Compression.LZ4 as source code and then compile it yourself; if you want to, but there's not really any reason to do that.

Or you can just install the already compiled (dll-)version by using NuGet (what almost everyone does). In which case you'd do no building/compiling of the LZ4 library on your own. You'd just add the using K4os.... thing to the top of your cs file and then use the library like in my example.

I hope that helps :smile: :+1:

Hurri04 commented 5 years ago

That means the MessagePack-CSharp library has to - in order to stay compatible to the standard - mark the compressed data as being an "extension" type (extension of the msgpack format). So other libraries know that the data that follows is not something defined in the standard, and they need to have special handling to understand it (or just error out, which is what actually happens in most cases).

Okay, that would explain it.

Regarding the K4os library: I downloaded the normal and the Stream package, unzipped them and copied the dll from the lib folder into the Plugins folder in my project. But either I was doing something wrong again or they are incompatible with Unity; just having the using K4os... line in a script results in Unity throwing this error message:

Assembly 'Assets/Plugins/K4os.Compression.LZ4.dll' will not be loaded due to errors: Unable to resolve reference 'System.Memory'. Is the assembly missing or incompatible with the current platform? Reference validation can be disabled in the Plugin Inspector.

Also the lines from the "After" code you posted above gave me some errors, e.g. the File.Write method not being able to accept an LZ4Stream as first parameter... And also the buffer doesn't get used there anymore?

However, I've found a different solution: Since the data gets saved to a file now I don't need to worry about performance as much anymore because the majority of the data (~7MB for the Tiles of the current Level) doesn't need to get serialized again by Unity every time the Inspector gets repainted but instead only at single events (e.g. when pressing a "Save" button in the LevelEditor UI. Same for loading.

I also did some refactoring in order to get my project to build so I could test AssetBundles a bit more and it turns out that I should be able to use the compression formats they provide (LZ4 or LZMA). Basically I turned the Level class from a ScriptableObject into a MonoBehaviour and added it to an empty GameObject in my Scene. The Level also contains a TextAsset variable now which holds a reference to the file which is created when saving the Tiles. This way I'm able to build everything relevant (the Scene, in order to be able to manually place enemy prefabs etc; the Level, to act as a conroller; and the TextAsset for the Tile data) into a single AssetBundle. (At the moment some referenced Materials and Textures also get pulled into the same AssetBundle automatically but I'll do some more research to be able to separate them into their own AssetBundle and find a dependency structure which works best for me.)

I guess this should solve it then and we could close this issue? If someone has any more questions in the future they could just reopen it.

And thank you for your help with the optimizations again! I'll look into a few more things (e.g. KnownTypes) and if anything else comes up I'll just open another issue ;)

rikimaru0345 commented 5 years ago

But either I was doing something wrong again or they are incompatible with Unity; just having the using K4os... line in a script results in Unity throwing this error message:

That's probably because System.Memory depends on something else as well (and that dependency is assumed to be part of the runtime, but when used in Unity it's not there). I'd suggest just opening an issue about that in the github page of the K4os.Compression.LZ4 lib.

Also the lines from the "After" code you posted above gave me some errors, e.g. the File.Write method not being able to accept an LZ4Stream as first parameter... And also the buffer doesn't get used there anymore?

Yeah that was just an example I typed out directly (not in visual studio :laughing: ). I think the last line should just be


lz4Stream.Write(buffer, 0, dataLength);

I guess this should solve it then and we could close this issue? If someone has any more questions in the future they could just reopen it.

And thank you for your help with the optimizations again! I'll look into a few more things (e.g. KnownTypes) and if anything else comes up I'll just open another issue ;)

Sure, close it :) No problem, if you have any questions about KnownTypes make sure to take a look at the new wiki page about type encoding here: https://github.com/rikimaru0345/Ceras/wiki/Type-Encoding Maybe it's helpful for what you're trying to do.