Mixing C# and IL locals

buybackoff commented 3 years ago

Hello Lucas!

I tried to use InlineIL for a painful case of working with ref locals, and found some limitations. A simplified sample below. I want to mix locals defined normally and via InlineIL. For ref locals, I really miss reference to reference similar to pointer to pointer and C# does not allow to initialize ref local in efficient way in my case. But after initialization, I would rather continue to use C#, not IL. The type under question if a struct with objects, so using Unsafe.AsPointer is dangerous and questionable.

Here we define baz in C# and then add two additional locals via IL. The output locals will be merged as expected. I want to have access to baz from IL.

        public int UseLocalVariables(int value)
        {
#pragma warning disable 219
            var baz = 10;
#pragma warning restore 219
            IL.DeclareLocals(
                new LocalVar("foo", typeof(int)),
                new LocalVar("bar", typeof(int))
            );

            IL.Push(value);
            Stloc("foo");

            Ldc_I4(42);
            Stloc("bar");

            IncrBaz();

            Ldloc("foo");

            // We want to be able to load baz, which is defined normally in C#
            // Ldloc("baz"); -- will not work, and local names are not available in Mono.Cecil: https://groups.google.com/g/mono-cecil/c/GwDG0bQ8TXk/m/mIr6QtXTQNsJ
            // But we could use an index. There is one auto local for local function IncrBaz,
            // so the baz has index 1. Or we could look the index in actual IL after weaving.
            Ldloc_1();
            Add();
            return IL.Return<int>();

            void IncrBaz()
            {
                baz++;
            }
        }

Looking into MethodLocals implementation I think the fix could be simple. Even though Mono.Cecil does not support names of locals (or technically there are no names in IL), we could use indices.

A potential fix is to add this:

This will create some complexity with indexing, because the required index will be the one from combined locals, not from IL-defined locals. And in the presence of local functions, like in the sample, there could be autogenerated locals. To get the right indices one will have to look into generated IL. Thankfully, in Rider, IL View is one click away and it shows IL after weaving. So this indexing behavior could be left as "by design".

But since it's the first time I look into this source code, I don't know if something else could break after the proposed change.

The real use case looks like this:

This is one of the hottest methods, and GetLoadRef2 is not trivial. But it's still faster to have it as AggressivelyInlined. However, this generates twice as much of the assembly code. So the idea is to initialize ref locals in a loop to have the GetLoadRef2 inlined once.

ltrzesniewski commented 3 years ago

Hi Victor!

InlineIL doesn't let you access variables declared in C# that way primarily because that would produce very brittle code, as the compiler can decide to manage locals in unpredictable ways. For instance, a local can be present in Debug builds, but it could be entirely eliminated in Release builds if the compiler can use the stack instead. You couldn't also rely on the index of your local to be stable.

Because of this, InlineIL lets you treat locals defined by DeclareLocals as always starting at index 0, and then remaps the indices of locals in emitted IL instructions to the real index (it actually appends the locals you define to the ones that already exist, and Ldloc_1 in your example would load the value of bar). Changing this would be a major breaking change.

Note that in your example, baz is not an int32 local in IL, but is wrapped in a "DisplayClass" struct because of the IncrBaz method.

Still, you can access C# variables in InlineIL by using methods like IL.Push. In your example, if you replace Ldloc_1(); with IL.Push(baz);, you'll get baz on the stack. 😉

Similarly, IL.Push(ref baz); would push a ref to baz on the stack, and you could use IL.Pop(out baz); to assign the value on top of the stack to baz.

I'm not sure I understood you correctly though (I don't really understand the link between the example and your real use case), so let me know if something's still missing in that toolbox.

buybackoff commented 3 years ago

With simple baz as int I understood how to do things, but what is baz was ref local, e.g. ref KeyValuePair<long,object> baz? I think there are not enough overloads to store a value to such baz, but maybe I missed something.

buybackoff commented 3 years ago

you could use IL.Pop(out baz); to assign the value on top of the stack to baz.

Probably this is what I've missed.

ltrzesniewski commented 3 years ago

I think with a ref local you can do anything you like on the target, but there's no method that would let you reassign the ref...

I initially tried to write IL.Pop as T Pop<T>() but I had to change it to void Pop<T>(out T value) since the other one didn't play very well with the compiler (it would emit code that did not lend itself to weaving very well). I guess I'd need to write a method like void Pop<T>(out ref T value) but that's not a valid signature 😢

I can see if I can add a ref T PopRef<T>() method, but I suppose if will prove difficult for the same reason as T Pop<T>() didn't work.

buybackoff commented 3 years ago

Do you think this creates a GC hole? DynValue is the same as KeyValuePair<long,object> from GC perspective.

This doesn't work as on the picture. But if I was sure it's safe from GC point of view, I will work out pointer arithmetics, maybe even without IL.

buybackoff commented 3 years ago

So this works:

But it doesn't with the loop:

The stack should be balanced

buybackoff commented 3 years ago

I was able to measure the impact of double inlining and it appears that the cost of for loop - additional local, int arithmetic, branch - is higher. Measured not exactly the same thing, but close, and will just leave two inlined methods. The main interest was to understand the impact. Sometimes very tiny obscure change improves (or kills) performance dramatically.

Thanks for your help! I really like how easy it is to use this lib 😄

ltrzesniewski commented 3 years ago

Yeah, the for loop won't work, as you need to have a well-defined evaluation stack state for each basic block.

But I think you probably have a GC hole here since you're going through a void* local... I guess it depends on the way this is going to get JITted. 😕

I think I'll have to provide some kind of PopRef method or add something similar which would let you reassign a ref.

buybackoff commented 3 years ago

I do not understand why the second picture from here doesn't work https://github.com/ltrzesniewski/InlineIL.Fody/issues/23#issuecomment-735236737

The stack should be balanced, we put two references on it in a loop, then pop outside the loop. The code is identical other than the loop...

If it worked, I would really like a method:

public ref T Ldlocal<T>(index or name) 

// usage

if(Ldlocal<DynValue>(0).IsNumber && Ldlocal<DynValue>(1).IsNumber)
    ...

// same as:

ref DynValue B = ref ...
ref DynValue C = ref ...
if(B.IsNumber && C.IsNumber)
   ...

So that there is no struct copying when accessing it's content for ref local.

ltrzesniewski commented 3 years ago

The evaluation stack doesn't work that way, its state needs to be statically defined at each instruction:

See ECMA-335, §I.12.3.2.1:

The evaluation stack is made up of slots that can hold any data type, including an unboxed instance of a value type. The type state of the stack (the stack depth and types of each element on the stack) at any given point in a program shall be identical for all possible control flow paths. For example, a program that loops an unknown number of times and pushes a new element on the stack at each iteration would be prohibited.

buybackoff commented 3 years ago

It's somewhat 🤯 to deal with dotnet value stack and locals while working on implementing value stack and locals. Those code sample are from Lua interpreter, trying to move values between the stack and variables in the most efficient branchless way 😀

ltrzesniewski commented 3 years ago

Haha yeah I can imagine 😅 Good luck with that! 😄

buybackoff commented 3 years ago

Good luck with that! 😄

Thanks! It's already faster than C on SciMark benchmark. But absolutely meaningless timing of the benchmark is 1095, and you could guess what my obsession is and what a round number is the goal 😂

I have implemented kind of JIT, or an optimization phase, that rewrites instructions from stack-based to virtual-registers based ones. Could eliminate more than 25% of all instructions. Without that the timing is at 1460. I should probably continue with this direction, and not IL rewrite.

Very fun experience overall.

ltrzesniewski / InlineIL.Fody

Mixing C# and IL locals #23