mcintyre321 / OneOf

Easy to use F#-like ~discriminated~ unions for C# with exhaustive compile time matching
MIT License
3.42k stars 159 forks source link

Use StructLayout for reduced memory usage #54

Closed daiplusplus closed 3 years ago

daiplusplus commented 4 years ago

When OneOf is used with many value-type arguments the value itself can get quite large.

For example, this Integer type below is 34 bytes big.

using Integer = OneOf<SByte, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64>; 
// 1 + 1 + 2 + 2 + 4 + 4 + 8 + 8 + 4 for _index
sizeof( Integer ) == 34

This can be improved by using struct packing.

For example:

[StructLayout(LayoutKind.Explicit)]
    public struct OneOf<T0, T1, T2, T3, T4, T5, T6, T7> : IOneOf
    {
        [FieldOffset( offset: 0 )]
        readonly Byte _index;

        [FieldOffset( offset: 1 )]
        readonly T0 _value0;
        [FieldOffset( offset: 1 )]
        readonly T1 _value1;
        [FieldOffset( offset: 1 )]
        readonly T2 _value2;
        [FieldOffset( offset: 1 )]
        readonly T3 _value3;
        [FieldOffset( offset: 1 )]
        readonly T4 _value4;
        [FieldOffset( offset: 1 )]
        readonly T5 _value5;
        [FieldOffset( offset: 1 )]
        readonly T6 _value6;
        [FieldOffset( offset: 1 )]
        readonly T7 _value7;

And now, sizeof(Integer) == 9.

Using a Byte for _index helps - but may harm performance owing to not being native-word-aligned anymore.

Thoughts?

mcintyre321 commented 4 years ago

I haven't used StructLayout before, so I don't know what all the ramifications of this would be... It's basically saying reuse the location for each of the variables?

Does this dynamically size depending on the largest type used? e.g. if you use a decimal it will reserve 12bytes?

The Byte performance for _index thing sounds a bit worrying - if you are working in such a memory-challenged, CPU rich situation, it might be worth having a fork of OneOf specifically for that...

daiplusplus commented 4 years ago

I haven't used StructLayout before, so I don't know what all the ramifications of this would be... It's basically saying reuse the location for each of the variables?

Yep.

Does this dynamically size depending on the largest type used? e.g. if you use a decimal it will reserve 12bytes?

Yep.

It's how you can generate true C/C++-style union { } types and perform a limited form of type-punning in .NET.

The Byte performance for _index thing sounds a bit worrying - if you are working in such a memory-challenged, CPU rich situation, it might be worth having a fork of OneOf specifically for that...

Feel free to disregard my changing of _index to Byte :)

mcintyre321 commented 4 years ago

Well if you're happy to submit a pull request, we can give it a go (not much spare time here unfortunately).

Needs to be done via the linqpad script - apologies if you're not on windows (I need to update that to a dotnet script at some point)

JoshSchreuder commented 3 years ago

It appears like this is not currently possible in the runtime (see https://github.com/dotnet/runtime/issues/43486)

Trying it out throws an exception:

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.TypeLoadException: Could not load type 'OneOf.OneOfNew`2' from assembly 'OneOf, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' because generic types cannot have explicit layout. at OneOf.Benchmarks.SwitchBenchmark.StructPacking() at BenchmarkDotNet.Autogenerated.Runnable_1.WorkloadActionNoUnroll(Int64 invokeCount) in C:\Work\OneOf\OneOf.Benchmarks\bin\Release\net461\e7305b5e-88b5-40bf-91bd-e1d3b2161c88\e7305b5e-88b5-40bf-91bd-e1d3b2161c88.notcs:line 1570 at BenchmarkDotNet.Engines.Engine.RunIteration(IterationData data) at BenchmarkDotNet.Engines.EngineFactory.Jit(Engine engine, Int32 jitIndex, Int32 invokeCount, Int32 unrollFactor) at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(EngineParameters engineParameters) at BenchmarkDotNet.Autogenerated.Runnable_1.Run(IHost host, String benchmarkName) in C:\Work\OneOf\OneOf.Benchmarks\bin\Release\net461\e7305b5e-88b5-40bf-91bd-e1d3b2161c88\e7305b5e-88b5-40bf-91bd-e1d3b2161c88.notcs:line 896 --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)

Even if it were possible, I don't think it would be making its way back to the versions of the framework that this library supports.

Jure-BB commented 3 years ago

I'm wondering, if this could be done without generics using only source generators.

For example:

[OneOf(typeof(string), "Text")]
[OneOf(typeof(int), "Number")]
partial struct StringOrNumber { }

would generate:

partial struct StringOrNumber 
{ 
    [FieldOffset( offset: 0 )]
    readonly int _index;

    [FieldOffset( offset: 4 )]
    readonly string _text; // value0

    [FieldOffset( offset: 4 )]
    readonly int _number; // value1

    ...
}

Additional benefit of this approach would be that StringOrNumber becomes an actual type with named options, instead of being OneOf<T1, T2> instance, which should improve debugging experience.

BrunoJuchli commented 2 months ago

I'm wondering, if this could be done without generics using only source generators.

There's a restriction with explicit struct layout that the source generator would have to adhere to: The memory of value and reference types mustn't overlap.

As an example:

[StructLayout(LayoutKind.Explicit)] 
public readonly record struct OneOfOptimized
{
    [FieldOffset(0)]
    readonly int _index;

    [FieldOffset(4)]
    readonly int _value0;

    [FieldOffset(4)]
    readonly string _value1;
}

Compiles, but at runtime loading the type will throw a TypeLoadException:

Could not load type 'OneOfOptimized' from assembly '...' because it contains an object field at offset 4 that is incorrectly aligned or overlapped by a non-object field.

It's possible, however, to have multiple reference types at the same memory:

[StructLayout(LayoutKind.Explicit)] 
public readonly record struct OneOfOptimized
{
    [FieldOffset(0)]
    readonly int _index;

    [FieldOffset(4)]
    readonly int _value0;

    [FieldOffset(4)]
    readonly byte _value2;

    [FieldOffset(8)]
    readonly string _value1;

    [FieldOffset(8)]
    readonly object _value3;
}

This example works, and Marshal.SizeOf(typeof(OneOfOptimized)) returns a value of 16. For the (implicit layout / no offsets), the size is 32 bytes.