mycroes / Sally7

C# implementation of Siemens S7 connections with a focus on performance
MIT License
56 stars 22 forks source link

Removes the in modifier, as it's not needed and harms perf by passing as (readonly) reference #20

Closed gfoidl closed 2 years ago

gfoidl commented 2 years ago

In C# structs are passed to methods by value, so they are "readonly" by construction. Hence there's no need to add the in modifier.

in may harm perf, as the arguments can't be passed in CPU-registers, rather they have to be passed via stack-allocated memory, which is way more overhead than simple passing by registers.

in is only useful when readonly structs need to be passed around, so that C# compiler doesn't need to create a defensive copy.

This PR removes all unnecessary uses of in. Note: technically this is a breaking change, but as the version of the project (according to SemVer) is below 1.0.0 this shouldn't be a problem.

mycroes commented 2 years ago

Hi GΓΌnther,

First of all, thanks for your contribution. I always appreciate new people chiming in and your GitHub profile alone (as short as it is) put a smile on my face πŸ˜„

Now on to the PR: The intention of using in was twofold, one for declaring the arguments as invariants and the other to actually improve performance. The former is absolutely not of utmost importance for this library. I consider this a small codebase, so I'm not worried about dropping in with regards to keeping the variables invariants. The latter is based on the fact that arguments can/will be passed by reference, I'm not sure if the performance benefit of that is something I made up or something I've read about. However, your statement about the fact that this causes unnecessary stack allocations at least sounds plausible, could you point me to some more info on that subject? Also, did you try benchmarking this? Would be nice to see what difference this makes as well (even if it's just a contrived sample to see the benefits).

Regards, Michael

gfoidl commented 2 years ago

dropping in with regards to keeping the variables invariants

As structs are value types, and passed by value they behave like invariants. For Span and ReadOnlySpan the JIT has special handling, which will be prohibited by in.

performance benefit of that is something I made up or something I've read about

If it's a readonly struct passed into a method, then one should use in (rather than ref) to signal to the C# compiler that it's invariant, otherwise the compiler may create a "defensive copy". But in for performance has only an effect when the struct can't be passed by CPU-registers. JIT is getting better and better in that area. As a rule of thumb, use in when the size of the struct is > CPU's word-size (but to be safe, measure it).

could you point me to some more info on that subject?

Let's look at

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct Tsap
{
    public Tsap(in byte channel, in byte position)
    {
        Channel = channel;
        Position = position;
    }

    public byte Channel;
    public byte Position;
}

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct Tsap1
{
    public Tsap1(byte channel, byte position)
    {
        Channel = channel;
        Position = position;
    }

    public byte Channel;
    public byte Position;
}

and the code generated by the JIT for x64:

Tsap..ctor(Byte ByRef, Byte ByRef)
    L0000: movzx eax, byte ptr [rdx]
    L0003: mov [rcx], al
    L0005: movzx eax, byte ptr [r8]
    L0009: mov [rcx+1], al
    L000c: ret

Tsap1..ctor(Byte, Byte)
    L0000: mov [rcx], dl
    L0002: mov [rcx+1], r8b
    L0006: ret

You see for Tsap1's ctor the arguments are passed in by registers, whilst for Tsap they're passed by reference via stack memory.

Note: to be fair, the ctor is so simple, that the JIT will always inline it, so there should be no difference at runtime for this example. But as byte is copyed by value there's no reason to use in here. Even a change of the argument in the ctor / method won't reflect on the caller-side -- so it's invariant by construction.

Let's try a simple benchmark. The time-difference isn't that huge for this simple one (only 1%), but the difference in (machine) code-size is more relevant, as smaller / less machine-code is almost always better, as it fits better into the CPU's instruction cache, etc. (similar to data-locality).

|    Method |     Mean |     Error |    StdDev | Ratio | RatioSD | Code Size |
|---------- |---------:|----------:|----------:|------:|--------:|----------:|
|    WithIn | 9.190 ns | 0.1887 ns | 0.1765 ns |  1.00 |    0.00 |      30 B |
| WithoutIn | 9.035 ns | 0.0743 ns | 0.0580 ns |  0.99 |    0.02 |      24 B |
; Bench.WithIn()
       push      rax
       lea       rax,[rcx+8]
       add       rcx,9
       movzx     eax,byte ptr [rax]
       mov       [rsp],al
       movzx     eax,byte ptr [rcx]
       mov       [rsp+1],al
       mov       eax,[rsp]
       add       rsp,8
       ret
; Total bytes of code 30

; Bench.WithoutIn()
       push      rax
       movzx     eax,byte ptr [rcx+8]
       movzx     edx,byte ptr [rcx+9]
       mov       [rsp],al
       mov       [rsp+1],dl
       mov       eax,[rsp]
       add       rsp,8
       ret
; Total bytes of code 24
benchmark code ```c# using System.Runtime.InteropServices; using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; BenchmarkRunner.Run(); public class Bench { private byte _channel = 0x12; private byte _position = 0x23; [Benchmark(Baseline = true)] public Tsap WithIn() => new Tsap(_channel, _position); [Benchmark] public Tsap1 WithoutIn() => new Tsap1(_channel, _position); } [StructLayout(LayoutKind.Sequential, Pack = 1)] public struct Tsap { public Tsap(in byte channel, in byte position) { Channel = channel; Position = position; } public byte Channel; public byte Position; } [StructLayout(LayoutKind.Sequential, Pack = 1)] public struct Tsap1 { public Tsap1(byte channel, byte position) { Channel = channel; Position = position; } public byte Channel; public byte Position; } ```

I like this project, it fits my needs very well. As I'm always interested how things are done, and thanks to open source, I can look into the code. If I spot some things I like to change or improve, then it's time to try it and create a PR πŸ˜‰

put a smile on my face

Now on mine too πŸ˜ƒ

mycroes commented 2 years ago

Hi,

dropping in with regards to keeping the variables invariants

As structs are value types, and passed by value they behave like invariants. For Span and ReadOnlySpan the JIT has special handling, which will be prohibited by in.

I meant actual invariants in the sense that the compiler will prohibit you from re-assigning the variable if it's declared as in, but as I said that's not a major concern.

could you point me to some more info on that subject?

Let's look at ...

I was expecting a link to a post, but you're posting a sample that might as well deserve a post in itself πŸ˜„. Nice explanation, I guess this says enough about the impact of this change.

I like this project, it fits my needs very well.

Care to detail that any further? In all honesty I'm aware of a few users only. I'm using this for Viscon machines as part of my day job, I know someone is using it for a trash sorting machine and recently I was approached by designers of a low code platform to use this as alternative to libnodave, so I'd love to hear more success stories.

Besides success stories, I heard of one unconfirmed issue with the latest Sally7 release that it might cause multiple write operations for a single write action. In all honesty at Viscon we're still using the latest version that doesn't include the concurrent operation support, I haven't yet figured how I'm going to test this behavior.

As I'm always interested how things are done, and thanks to open source, I can look into the code. If I spot some things I like to change or improve, then it's time to try it and create a PR πŸ˜‰

+1 from me. I must admit I like to keep tight control of projects I maintain, yet I absolutely don't feel I'm the best programmer in/of whatever. I really appreciate contributions like these though and I especially love it when I can actually learn something from it, so thank you very much for that.

gfoidl commented 2 years ago

I try to follow this post as guideline

Oh, I try to embrace that next time. Thanks for the hint.

Care to detail that any further?

Without going into too much details (I'm not allowed to talk about this), but it's quite aged machine in the chemical industry with a very old S5-948 and a S7-315 PLC. Both are accessible via ISO over TCP, and the task is to monitor and collect the machine values (data blocks). So at the moment it's read-only, with plans to enter write-mode later too.

When that project was started, I evaluated some .NET PLC libraries (S7netplus, Sally7, Sharp7) and had a look at proprietary ones too. But I like open source, as mentioned above. The choice was Sally7, mainly to reasons you outlined in https://github.com/mycroes/Sally7/issues/7#issuecomment-742827578 too (so strong typed results, and async + easy to use).

So far everything the project needs is covered by Sally7.

For local and CI testing https://github.com/fbarresi/SoftPlc is used to "emulate" the PLC.

heard of one unconfirmed issue with the latest Sally7 release that it might cause multiple write operations for a single write action

Sounds like a race condition. I need to have a closer look into the code, and I can try to reproduce that behavior with SoftPlc. We are talking about the change introduced by 279ddc4d3983371e19741007d8bce451b5617195?

so thank you very much for that.

That's why I ❀️ open source. With the knowledge of many one can learn from each other, and improve a product. But there's no need to thank me, as it's quite opposite. Sally7 is built on your knownledge and I'm happy to be able to use it.

scamille commented 2 years ago

Sounds like a race condition. I need to have a closer look into the code, and I can try to reproduce that behavior with SoftPlc. We are talking about the change introduced by 279ddc4?

Yeah I ran into problems with that when I tried to deploy it during one of our projects, but it might have been something completely different. The problems definitely went away after switching the driver form Sally7 back to the other S7 library we use. Unfortunately I never had the time to fully investigate what was causing the problems :/ I would have loved to add a runtime behavior switch to optionally guard all Sally7 calls behind a lock (async based) to see if that changed the behavior.

It might also have been the case that our PLC programmers just were not prepared to handle the concurrent requests, and there is no problem with the Sally7 implementation at all. Again, unfortunately we couldn't properly analyze anything of this.

mycroes commented 2 years ago

Care to detail that any further?

Without going into too much details (I'm not allowed to talk about this), but it's quite aged machine in the chemical industry with a very old S5-948 and a S7-315 PLC. Both are accessible via ISO over TCP, and the task is to monitor and collect the machine values (data blocks). So at the moment it's read-only, with plans to enter write-mode later too.

Thanks for this glimpse, it's nice to hear the area of use and the technology it's applied to.

When that project was started, I evaluated some .NET PLC libraries (S7netplus, Sally7, Sharp7) and had a look at proprietary ones too. But I like open source, as mentioned above. The choice was Sally7, mainly to reasons you outlined in #7 (comment) too (so strong typed results, and async + easy to use).

So far everything the project needs is covered by Sally7.

Basically I see Sally7 as the single best choice when it covers the required feature set. I also don't mind expanding on features as needed, but for my own use I still haven't required more than is in here.

For local and CI testing https://github.com/fbarresi/SoftPlc is used to "emulate" the PLC.

I still intend to tackle this with Sally7 as well at some point, it doesn't have any priority though.

heard of one unconfirmed issue with the latest Sally7 release that it might cause multiple write operations for a single write action

Sounds like a race condition. I need to have a closer look into the code, and I can try to reproduce that behavior with SoftPlc. We are talking about the change introduced by 279ddc4?

Exactly. I'm used to multithreading and synchronization primitives and all that, doesn't mean I don't make mistakes either. Given that I also write most of my open source code late in the evening I'd say there's a fair chance of bugs popping up, there's room for improvement there as well.

so thank you very much for that.

That's why I ❀️ open source. With the knowledge of many one can learn from each other, and improve a product. But there's no need to thank me, as it's quite opposite. Sally7 is built on your knownledge and I'm happy to be able to use it.

Well again, thanks. My knowledge has been made possible by others sharing their knowledge and now you're a part of that as well.

gfoidl commented 2 years ago

multithreading and synchronization primitives and all that, doesn't mean I don't make mistakes either

Anyone who blames to don't make mistakes in that area is ... well a kind of liar. Do you know about https://github.com/microsoft/coyote? This may help here / or be overkill, I don't know...need to look into the code (hadn't time for that so far).

gfoidl commented 2 years ago

Sounds like a race condition. I need to have a closer look into the code, and I can try to reproduce that behavior with SoftPlc. We are talking about the change introduced by 279ddc4?

When looking at the code, and stepping with the debugger through it I can't spot a place where a race could occur (assuming the types used from .NET are correct). Also tried to repro this behavior, but without any success. I moved some things around (967f9504f6c6fa8e1ad2f5a6f486285664d135b6 built on top of https://github.com/mycroes/Sally7/pull/21), also can't repro it. BTW: should this commit be brought into main? (I like that the Channels are hidden as implementation detail -- my motivation was to try a SemaphoreSlim, but this allocates more because of Task vs. ValueTask).

mycroes commented 2 years ago

I like the coyote suggestion, hadn't heard of it before but I think I'll give that a try. I also took a second look after @scamille reported the possibility of an issue, but I couldn't figure any issue back then either.

I like your changes to the request executor, when I was scrolling through I thought how could I've missed that signal class, then I noticed I didn't actually miss it πŸ˜„