T: struct only serialization

dzmitry-lahoda commented 5 years ago

Structs are GC free. Structs can be Unsafe manipulated with no reflection. No need to generate code or dynamic invoke and works with AOT, I guess. C# leaps with structs- ref, in, readonly, interfaces with implementations, Unsafe. So would you consider to change direction to struct only approach? Alternatively Unsafe cast class to struct and reflect upon it.

As of now codes looks like GC heavy and reflection based (if to generate IL than will not AOT, if to create msbuild task - than OK, but time consuming work).

Right now I am trying to sketch stuff: https://github.com/dzmitry-lahoda/ObjectLayoutInspector/blob/feature/unsafelayout/src/ObjectLayoutInspector/UnsafeLayout.cs

invertedtomato commented 5 years ago

Hi @dzmitry-lahoda! I wouldn't want to only support structs. It would be too limiting for the use case. But I'm certainly open to supporting and even recommending structs if the performance was suitably better. Do you have any data on the performance difference?

dzmitry-lahoda commented 5 years ago

One some platforms classes has sequential memory layout and can be used as structs. I have no data yet. May be within 4 month. But that is new design opened by Unsafe class (previously possible only on IL level).

dzmitry-lahoda commented 5 years ago

Here is reflection part https://github.com/SergeyTeplyakov/ObjectLayoutInspector/issues/15 .

I guess I can support classes if these are known containers, fixed arrays in structs, or classes to which default non null instances ctors(ctors with no args or factories) are provided as arguments and fields only serialization is fine.

After parsing is done, Unsafe is used to do serialization. It is very fast . With all work done regarding struct in csharplang and roslyn, structs do get superpowers. So will look where I can go.

I do test Unsafe and it is very fast https://gitlab.com/dzmitry-lahoda/dotnet-system-numerics-algebra/tree/master/benchmarks (and seems can support custom types - fixed points, quaternions, vectors)

invertedtomato commented 5 years ago

I'm having difficulty seeing where this fits in. I think the key to this idea is performance. Do you have some numbers comparing?

dzmitry-lahoda commented 5 years ago

I can run on each field of unamanged structure of any size and work it as if it is memory. Like here(I use diff here, but could do delta or zigzag of whatever):

                ref var a = ref Unsafe.AsRef<T>(in baseline);
                ref var b = ref Unsafe.AsRef<T>(in update);
                for (var i = 0; i < fields.Count; i++)
                {
                    var field = fields[i];
                    var fieldType = field.Type;
                    ref var moveA = ref Unsafe.AddByteOffset(ref a, new IntPtr(field.Offset));
                    ref var moveB = ref Unsafe.AddByteOffset(ref b, new IntPtr(field.Offset));
                    DiffPrimitive(field.Size, fieldType, ref moveA, ref moveB);
                }
...
        private void DiffPrimitive<T>(ushort size, Type fieldType, ref T moveA, ref T moveB) where T : unmanaged
        {
            if (fieldType == typeof(int))
            {
                ref var af = ref Unsafe.As<T, int>(ref moveA);
                ref var bf = ref Unsafe.As<T, int>(ref moveB);
                DiffPrimitive(af, bf);
            }
...
      private void DiffPrimitive(int a, int b)
        {
            if (a != b)
            {
                stream.Write(true);
                stream.Write(in b);
            }
            else
                stream.Write(false);
        }

Performance will be much faster than refection, but slightly (if ever) slower than code generator. Approach is as kind of memcpy. So it is fast. As of now I parsed only unamanged structs, but I guess I can do classes either.

In short - field only serialization via memcpy. With API which is no so know in .NET ecosystem.

Like

registry.RegisterStruct<MyStruct1>();
registry.RegisterStruct<MyStruct2>();
registry.RegisterObject<MyObject>(new DefaultFactory(()=> new MyObject())); // we need baseline object creation to compare for objects.

So we can layout object manually if these are from structs, or do custom registration of objects. All registration. I guess registration may be done more conventional if after https://github.com/dotnet/csharplang/issues/124.

I can even make registery for custom comparers-limiters-ranges-quantizers and primitives almost zero cost via unsafe.

dzmitry-lahoda commented 5 years ago

As of now I build my solution on:

https://github.com/dzmitry-lahoda/NetStack/tree/master/Source/NetStack.Serialization https://github.com/SergeyTeplyakov/ObjectLayoutInspector/pull/19

I guess I may support limited form of references object in structs if these are pinned before serialization (including containers).

invertedtomato commented 5 years ago

Hey Dzmitry,

That link seems to be dead now. Could you resend?

dzmitry-lahoda commented 5 years ago

Here is code I use to scan structures merged https://github.com/SergeyTeplyakov/ObjectLayoutInspector/blob/master/src/ObjectLayoutInspector/UnsafeLayout.cs . As we have complex structures - it works well for us. Not sure how to get layout of class same way, but that could be expansion (could be very cool). Works on big endian aot arm for sure.

After structures are parsed. We store each structure layout in static field of generic static class per structure parsed. So the cache is "jit" compiled per item, not dictionary(also there are interesting fast dictionaries in the wild). No sure if much faster. We convert Type into enum KnownType + Custom so that JIT to translate switch in assembly jump table.

Than write data into https://github.com/dzmitry-lahoda/NetStack . With limits defined to reduce size of data (lossless - min max, loss - precision quantization).

Right now I am thinking of patterns to improve flexibility of solution without loss of performance (as of now solution is almost same of performance as it you would get typing all manually). I target .NET Core 3 and what ever C# 8 gives.

My goal is smallest size first, as declared in this library.

So I am going to try to learn dictionary of out small sizes as transmission continuous, so in production we deploy with best dictionary. Like 111 is more frequent than 11 for field Bar in Foo. So we then 11 for 111 and 111 for 11 for that field. https://github.com/Unity-Technologies/FPSSample/issues/63

Another interesting option, but that seems totally game or remote robot control orientated so. Client and sever do predict values (via linear or non linear or via learned fixed point regression) and learn dictionary for these values. More specifically, for delta of these values. After learning happen, need to store data and use already learned in production or resend it as it goes online. Approach from UnityFPS either (seems AAA shooters do that). So data as is 111111. Delta 111. Delta from prediction could end up 11.

And from what I came here, will try to https://github.com/invertedtomato/integer-compression

invertedtomato / lightweight-serialization

T: struct only serialization #2