Performance - Githubissues

johanfforsberg commented 8 years ago

While investigating the possibility of using cpppo in "production" to read thousands of tags as quickly as possible, I've noticed that the performance is limited by CPU usage and not network or PLC. Some testing with PyPy showed a significant increase in throughput (similar to the library we're using now, which is written in C) but still apparently limited by the CPU.

What are your thoughts about performance? Have you considered options like Cython for optimizing "bottlenecks"?

pjkundert commented 8 years ago

Yes, I've noticed that there is a significant performance constraints in the processing of the "client" side of the EtherNet/IP CIP requests. I haven't looked into it in too much detail. I've noticed that PyPy seems to help, but it really shouldn't be that slow. I'll take a look, too, and see what I can find out.

Let's take a look at the mix of requests you're trying to parse, and get this code tightened up for you; you should be able to use this efficiently in production. I'm on my way back from Munich over the next 36 hours, so I may not be immediately responsive...

johanfforsberg commented 8 years ago

Sounds great! I won't have access to a PLC until monday anyway so It'll have to wait if you want full info. But I've essentially been using the "getattr.py" script in server/enip to do the testing, and the tags were a mixed bunch of single (non-array) types.

johanfforsberg commented 8 years ago

OK I finally have some time to sit down with a PLC. It's a CompactLogix.

I'm running this command (these are all boolean tags):

python -m cpppo.server.enip.thruput -d 4 -m 420 -r 1000 -a w-kitslab-compactlogix-0 B_ProgDisable_C B_DigitalAlarmTag_C B_AutoChangeAlarmValue_C B_ProgAck_AD_C B_ProgAckAll_HB B_ProgEnable_HB B_Reset_C FB_ALMA01_AA.HHProgAck FB_ALMA01_AA.HHOperAck FB_ALMA01_AA.HProgAck FB_ALMA01_AA.HOperAck FB_ALMA01_AA.LProgAck FB_ALMA01_AA.LOperAck FB_ALMA01_AA.LLProgAck

... and the output, minus the very long slab of individual tag values, is:

14000 operations using 876 requests in   32.48s at pipeline depth  4; 431.0 TPS

I came up with the numbers for depth and multiple by experimentation; larger numbers either gave errors or did not increase performance noticably. I am wondering a bit about the -m number; normally we're able to use a request size of almost 500 bytes, but I don't know if this corresponds exactly to that number.

Running the same command with pypy (only increasing the -r to 10000 to account for warmup time) gives a much better result:

140000 operations using 8751 requests in  106.44s at pipeline depth  4; 1315.2 TPS

However in both tests the CPU is pegged at 100% suggesting that the bottleneck is not in the network or the PLC. A similar test using a library written in C (https://github.com/EPICSTools/ether_ip.git) gives performance roughly at the pypy level, but causing no measurable CPU load.

Tell me if you need more details about any of this, or any other interesting tests to perform.

datasim commented 8 years ago

Interesting. Well, the pypy test tells us that we are probably able to achieve the bandwidth and/or PLC capacity limits of performance, but at ~100 CPU usage.

I've been working on a branch 'feature-performance' in the cpppo Git repo; give that a try. It (so far) only gives me a ~5 to 10 percent improvement. I'm still working on this; I can't put my finger on exactly why parsing responses is still so expensive, but I am making progress.

-pjk

On Wed, Nov 4, 2015 at 8:50 AM, Johan Forsberg notifications@github.com wrote:

OK I finally have some time to sit down with a PLC. It's a CompactLogix.

I'm running this command (these are all boolean tags):

python -m cpppo.server.enip.thruput -d 4 -m 420 -r 1000 -a w-kitslab-compactlogix-0 B_ProgDisable_C B_DigitalAlarmTag_C B_AutoChangeAlarmValue_C B_ProgAck_AD_C B_ProgAckAll_HB B_ProgEnable_HB B_Reset_C FB_ALMA01_AA.HHProgAck FB_ALMA01_AA.HHOperAck FB_ALMA01_AA.HProgAck FB_ALMA01_AA.HOperAck FB_ALMA01_AA.LProgAck FB_ALMA01_AA.LOperAck FB_ALMA01_AA.LLProgAck

... and the output, minus the very long slab of individual tag values, is:

14000 operations using 876 requests in 32.48s at pipeline depth 4; 431.0 TPS

I came up with the numbers for depth and multiple by experimentation; larger numbers either gave errors or did not increase performance noticably. I am wondering a bit about the -m number; normally we're able to use a request size of almost 500 bytes, but I don't know if this corresponds exactly to that number.

Running the same command with pypy (only increasing the -r to 10000 to account for warmup time) gives a much better result:

140000 operations using 8751 requests in 106.44s at pipeline depth 4; 1315.2 TPS

However in both tests the CPU is pegged at 100% suggesting that the bottleneck is not in the network or the PLC. A similar test using a library written in C (https://github.com/EPICSTools/ether_ip.git) gives performance roughly at the pypy level, but causing no measurable CPU load.

Tell me if you need more details about any of this, or any other interesting tests to perform.

— Reply to this email directly or view it on GitHub https://github.com/pjkundert/cpppo/issues/10#issuecomment-153770410.

pjkundert commented 7 years ago

I can get up to 300 TPS using CPython2/3, up to 700 TPS using pypy now, on my i7 Mac. Still work to do, but performance is probably no longer at the top of the priority list...

pjkundert / cpppo

Performance #10