optimise dict_from_bytes

chrisbeardy commented 2 years ago

The current implementation of dict_from_bytes loops over all the items in an array, both of each array in the structure for i in range(size) and of structure if an array of structures for structure in range(0, array_size):

When reading large arrays this is slow, can this be optimised better use of struct.unpack?

Instead of unpacking one item of the array at a time, can this function be refactored to struct.unpack the whole array at once.

remove the for i in range(size) and change

from:

n_bytes = sizeof(plc_datatype)
struct.unpack(
    DATATYPE_MAP[plc_datatype],
    bytearray(byte_list[index: (index + n_bytes)]),
)[0]

to:

n_bytes = sizeof(plc_datatype) * size
datatype = DATATYPE_MAP[plc_datatype]
datatype = datatype[0] + str(size) + datatype 
struct.unpack(
    datatype,
    bytearray(byte_list[index: (index + n_bytes)]),
)[0]

I'm also unsure if it is the unpack which is slow too so we could build up an unpack string, then unpack in one go, this would also benefit if for structure in range(0, array_size)?

I've been meaning to tackle this for a while and often when reading large amounts of data I use multiple read_by_names instead as this is actually faster, but you then have to workaround the fact that the data is being read over multiple ADS cycles. I raise this now as could be thought about when refactoring for #288.

Happy to look at other alternatives too to decrease the time taken to go from bytes to the dictionary, from my testing of large amounts of data, the acatual ADS read of all the bytes is super quick but can take seconds to unpack it into the right format

stlehmann commented 2 years ago

from my testing of large amounts of data, the acatual ADS read of all the bytes is super quick but can take seconds to unpack it into the right format

That is interesting. I agree, looking at the code there could be some room for improvement in regards of speed. @chrisbeardy you're welcome to tackle on some improvements here 👍

Also we now have WSTRING support in pyads and I just see that it is not supported for structures, yet. I'll open an issue for that.

chrisbeardy commented 2 years ago

TLDR On a read_structure_by_name with larger amounts of data being read:

Approx 45% of time is spent doing the bytes_to_dict conversion
Approx 45% of time is doing the [i for i in read_data] on line 274 of pyads_ex.py

For speedup, I suggest we target these two areas. Improving [i for i in read_data] on line 274 of pyads_ex.py will also have more benefits across the whole library.

Tests I've just conducted some quick tests. Based on a real world example from a machine I have done where we have a 60 second rolling log for data capture.

I conducted these tests on both python 3.6 pyads 3.3.4 and python 3.8 and pyads 3.3.9, both show the same results. I have a relatively quick Intel(R) Xeon(R) CPU E3-1535M v6 @ 3.10GHz, 3101 Mhz, 4 Core(s), 8 Logical Processor(s) and 32gb RAM machine.

It is easy to take a quick look the [i for i in read_data] on line 274 of pyads_ex.py. It seems Using list(read_data) would show approx up to 20% improvement, for very small amounts of data this actually makes it slighty less performant but then the overall time difference is then negligible.

Interestingly using numpy frombuffer gives a lovely 80% increase in performance, again for very small amounts of data this actually makes it slighty less performant likely due to the overhead of numpy using c.

I'm not suggesting we use numpy here as that would seem overkill to add that dependency for just one line. But I will do a PR for using list(read_data) instead.

Given that it is already possible to return the ctypes I think trying to optimise this to list case futher using C would likely just make pyads more complex than it needs to be but for use cases of large amounts of data we could add to the documentation a suggestion to use numpy:

    data_ctype_raw = plc.read_by_name("", c_ubyte * size_of_data, handle=datalog_handle, return_ctypes=True)
    data = pyads.dict_from_bytes(np.frombuffer(data_ctype_raw, dtype=np.ubyte).tolist(), DATALOG_STRUCTURE)

Now to tackle to the bytes_to_dict conversion ...

Code

Python Code (if you set a breakpoint on the return statement, you can use a debugger to inspect the variables):

from ctypes import c_ubyte

import pyads
import time

import numpy as np

def main():
    size_of_data = pyads.size_of_structure(DATALOG_STRUCTURE)
    size_of_big_data = pyads.size_of_structure(DATALOG_STRUCTURE_BIG)
    plc = pyads.Connection(ADS_ADDRESS, ADS_PORT)
    plc.open()
    datalog_handle = plc.get_handle("MAIN.data")
    datalog_big_handle = plc.get_handle("MAIN.dataBig")

    data_read_start = time.perf_counter()
    data = plc.read_structure_by_name("", DATALOG_STRUCTURE, structure_size=size_of_data, handle=datalog_handle)
    data_read_time = time.perf_counter() - data_read_start
    plc.write_by_name("MAIN.lastRead", data["currentPointer"], pyads.PLCTYPE_UDINT)

    data_read_start = time.perf_counter()
    data2 = plc.read_structure_by_name("", DATALOG_STRUCTURE, structure_size=size_of_data, handle=datalog_handle)
    data_read_2_time = time.perf_counter() - data_read_start

    data_read_start = time.perf_counter()
    data_big = plc.read_structure_by_name("", DATALOG_STRUCTURE_BIG, structure_size=size_of_big_data, handle=datalog_big_handle)
    data_read_big_time = time.perf_counter() - data_read_start

    data_read_start = time.perf_counter()
    data_big_2 = plc.read_structure_by_name("", DATALOG_STRUCTURE_BIG, structure_size=size_of_big_data, handle=datalog_big_handle)
    data_read_big_time_2 = time.perf_counter() - data_read_start

    # test with just a read by name and return ctype
    data_read_start = time.perf_counter()
    data_big_no_to_dict_raw = plc.read_by_name("", c_ubyte * size_of_big_data, handle=datalog_big_handle)
    data_read_big_no_to_dict_time = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_big_no_to_dict = pyads.dict_from_bytes(data_big_no_to_dict_raw, DATALOG_STRUCTURE_BIG)
    data_read_big_to_dict_time = time.perf_counter() - data_read_start

    data_read_start = time.perf_counter()
    data_big_ctype_raw = plc.read_by_name("", c_ubyte * size_of_big_data, handle=datalog_big_handle, return_ctypes=True)
    data_read_big_ctype_time = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_big_ctype_to_array = [i for i in data_big_ctype_raw]
    data_read_big_to_list_time = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_big_ctype_to_array = list(data_big_ctype_raw)
    data_read_big_to_list_time_2 = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_big_ctype_to_array = np.frombuffer(data_big_ctype_raw, dtype=np.ubyte).tolist()
    data_read_big_to_list_time_3 = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_big_ctype_dict = pyads.dict_from_bytes(data_big_ctype_to_array, DATALOG_STRUCTURE_BIG)
    data_read_big_from_list_to_dict_time = time.perf_counter() - data_read_start

    data_read_start = time.perf_counter()
    data_ctype_raw = plc.read_by_name("", c_ubyte * size_of_data, handle=datalog_handle, return_ctypes=True)
    data_read_ctype_time = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_ctype_to_array = [i for i in data_ctype_raw]
    data_read_to_list_time = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_ctype_to_array = list(data_ctype_raw)
    data_read_to_list_time_2 = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_ctype_to_array = np.frombuffer(data_ctype_raw, dtype=np.ubyte).tolist()
    data_read_to_list_time_3 = time.perf_counter() - data_read_start
    data_read_start = time.perf_counter()
    data_ctype_dict = pyads.dict_from_bytes(data_ctype_to_array, DATALOG_STRUCTURE)
    data_read_from_list_to_dict_time = time.perf_counter() - data_read_start

    plc.release_handle(datalog_handle)
    plc.release_handle(datalog_big_handle)
    plc.close()

    return

ADS_ADDRESS = "169.254.198.195.1.1"
ADS_PORT = 851
DATALOG_LENGTH = 15001
DATALOG_STRUCTURE: pyads.StructureDef = (
    ("currentPointer", pyads.PLCTYPE_UDINT, 1),
    ("data1", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data2", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data3", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data4", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data5", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data6", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data7", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data8", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data9", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data10", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data11", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("data12", pyads.PLCTYPE_LREAL, DATALOG_LENGTH),
    ("cycleCount", pyads.PLCTYPE_UDINT, DATALOG_LENGTH),
    ("dcTaskTime", pyads.PLCTYPE_STRING, DATALOG_LENGTH),
    ("lastExecTime", pyads.PLCTYPE_UDINT, DATALOG_LENGTH),
    ("cycleTimeExceeded", pyads.PLCTYPE_BOOL, DATALOG_LENGTH),
    ("RTViolation", pyads.PLCTYPE_BOOL, DATALOG_LENGTH),
)
DATALOG_LENGTH_BIG = 60001
DATALOG_STRUCTURE_BIG: pyads.StructureDef = (
    ("currentPointer", pyads.PLCTYPE_UDINT, 1),
    ("data1", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data2", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data3", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data4", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data5", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data6", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data7", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data8", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data9", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data10", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data11", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("data12", pyads.PLCTYPE_LREAL, DATALOG_LENGTH_BIG),
    ("cycleCount", pyads.PLCTYPE_UDINT, DATALOG_LENGTH_BIG),
    ("dcTaskTime", pyads.PLCTYPE_STRING, DATALOG_LENGTH_BIG),
    ("lastExecTime", pyads.PLCTYPE_UDINT, DATALOG_LENGTH_BIG),
    ("cycleTimeExceeded", pyads.PLCTYPE_BOOL, DATALOG_LENGTH_BIG),
    ("RTViolation", pyads.PLCTYPE_BOOL, DATALOG_LENGTH_BIG),
)

if __name__ == "__main__":
    main()

PLC Code:

{attribute 'pack_mode' := '1'}
TYPE stData :
STRUCT
    currentPointer : UDINT;
    data1 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data2 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data3 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data4 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data5 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data6 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data7 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data8 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data9 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data10 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data11 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    data12 : ARRAY [0..GVL.dataLogLength] OF LREAL;
    cycleCount : ARRAY [0..GVL.dataLogLength] OF UDINT;
    dcTaskTime : ARRAY [0..GVL.dataLogLength] OF STRING;
    lastExecTime : ARRAY [0..GVL.dataLogLength] OF UDINT;
    cycleTimeExceeded : ARRAY [0..GVL.dataLogLength] OF BOOL; 
    RTViolation : ARRAY [0..GVL.dataLogLength] OF BOOL;
END_STRUCT
END_TYPE

{attribute 'pack_mode' := '1'}
TYPE stDataBig :
STRUCT
    currentPointer : UDINT;
    data1 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data2 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data3 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data4 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data5 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data6 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data7 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data8 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data9 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data10 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data11 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    data12 : ARRAY [0..GVL.dataLogLengthBig] OF LREAL;
    cycleCount : ARRAY [0..GVL.dataLogLengthBig] OF UDINT;
    dcTaskTime : ARRAY [0..GVL.dataLogLengthBig] OF STRING;
    lastExecTime : ARRAY [0..GVL.dataLogLengthBig] OF UDINT;
    cycleTimeExceeded : ARRAY [0..GVL.dataLogLengthBig] OF BOOL; 
    RTViolation : ARRAY [0..GVL.dataLogLengthBig] OF BOOL;
END_STRUCT
END_TYPE

{attribute 'qualified_only'}
VAR_GLOBAL CONSTANT
    dataLogLength : UDINT := 15000;
    dataLogLengthBig : UDINT := 60000;
END_VAR

PROGRAM MAIN
VAR
    data : stData;
    dataBig : stDataBig;
    i : UDINT;
    iBig : UDINT;
    j : UDINT;
    taskIndex : GETCURTASKINDEX;
    taskInfo : PlcTaskSystemInfo;
    lastRead : UDINT;
END_VAR
taskInfo := _TaskInfo[taskIndex.index];

IF i > GVL.dataLogLength THEN
    i := 0;
END_IF

IF iBig > GVL.dataLogLengthBig THEN
    iBig := 0;
END_IF

IF j > 1_000_000 THEN
    j := 0;
END_IF

data.currentPointer := i;
data.data1[i] := UDINT_TO_LREAL(j);
data.data2[i] := UDINT_TO_LREAL(j);
data.data3[i] := UDINT_TO_LREAL(j);
data.data4[i] := UDINT_TO_LREAL(j);
data.data5[i] := UDINT_TO_LREAL(j);
data.data6[i] := UDINT_TO_LREAL(j);
data.data7[i] := UDINT_TO_LREAL(j);
data.data8[i] := UDINT_TO_LREAL(j);
data.data9[i] := UDINT_TO_LREAL(j);
data.data10[i] := UDINT_TO_LREAL(j);
data.data11[i] := UDINT_TO_LREAL(j);
data.data12[i] := UDINT_TO_LREAL(j);
data.cycleCount[i] := taskInfo.CycleCount;
data.dcTaskTime[i] := LINT_TO_STRING(taskInfo.DcTaskTime);
data.lastExecTime[i] := taskInfo.LastExecTime;
data.cycleTimeExceeded[i] := taskInfo.CycleTimeExceeded;
data.RTViolation[i] := taskInfo.RTViolation;

dataBig.currentPointer := iBig;
dataBig.data1[iBig] := UDINT_TO_LREAL(j);
dataBig.data2[iBig] := UDINT_TO_LREAL(j);
dataBig.data3[iBig] := UDINT_TO_LREAL(j);
dataBig.data4[iBig] := UDINT_TO_LREAL(j);
dataBig.data5[iBig] := UDINT_TO_LREAL(j);
dataBig.data6[iBig] := UDINT_TO_LREAL(j);
dataBig.data7[iBig] := UDINT_TO_LREAL(j);
dataBig.data8[iBig] := UDINT_TO_LREAL(j);
dataBig.data9[iBig] := UDINT_TO_LREAL(j);
dataBig.data10[iBig] := UDINT_TO_LREAL(j);
dataBig.data11[iBig] := UDINT_TO_LREAL(j);
dataBig.data12[iBig] := UDINT_TO_LREAL(j);
dataBig.cycleCount[iBig] := taskInfo.CycleCount;
dataBig.dcTaskTime[iBig] := LINT_TO_STRING(taskInfo.DcTaskTime);
dataBig.lastExecTime[iBig] := taskInfo.LastExecTime;
databig.cycleTimeExceeded[iBig] := taskInfo.CycleTimeExceeded;
dataBig.RTViolation[iBig] := taskInfo.RTViolation;

i := i + 1;
iBig := iBig + 1;
j := j + 1;

Code for only looking at [i for i in read_data] on line 274 of pyads_ex.py

import timeit
import ctypes
import numpy as np

number = 1_000_000 # _000
pylist = list(range(number))

def list_comp():
    t = timeit.Timer("data = [i for i in c_array]", globals=globals())
    print("[i for i in c_array] : " + str(t.repeat(repeat=5, number=10)))

def to_list():
    t = timeit.Timer("data = list(c_array)", globals=globals())
    print("list(c_array) : " + str(t.repeat(repeat=5, number=10)))

def using_numpy(dt):
    code = f"""\
from_buffer = np.frombuffer(c_array, dtype={dt})
buffer_list = from_buffer.tolist()
"""
    t = timeit.Timer(code, globals=globals())
    print("np.frombuffer : " + str(t.repeat(repeat=5, number=10)))

print("Int Array")
c_array_type = ctypes.c_int * len(pylist)
c_array = c_array_type(*pylist)
# c_array = (ctypes.c_int * len(pylist))(*pylist)

if type(c_array_type).__name__ == "PyCArrayType":
    list_comp()
    to_list()
    using_numpy("np.int32")
    from_buffer = np.frombuffer(c_array, dtype=np.int32)
    buffer_list = from_buffer.tolist()
    print([i for i in c_array] == list(c_array) == buffer_list)

print("\nBool Array")
c_array = (ctypes.c_bool * len(pylist))(*pylist)
list_comp()
to_list()
print([i for i in c_array] == list(c_array))

print("\nFloat Array")
c_array = (ctypes.c_float * len(pylist))(*pylist)
list_comp()
to_list()
using_numpy("np.float32")
from_buffer = np.frombuffer(c_array, dtype=np.float32)
buffer_list = from_buffer.tolist()
print([i for i in c_array] == list(c_array) == buffer_list)

print("\nChar Array")
pylist = [1] * number
c_array = (ctypes.c_char * len(pylist))(*pylist)
list_comp()
to_list()
print([i for i in c_array] == list(c_array))

print("\nu_byte array - What actually comes into pyads")
c_array = (ctypes.c_ubyte * len(pylist))(*pylist)
list_comp()
to_list()
using_numpy("np.ubyte")
from_buffer = np.frombuffer(c_array, dtype=np.ubyte)
buffer_list = from_buffer.tolist()
print([i for i in c_array] == list(c_array) == buffer_list)

stlehmann commented 2 years ago

Very interesting that the list() function performs so much better then the list-comprehension.

stlehmann commented 2 years ago

20% improvement is definetly worth a PR.

stlehmann commented 2 years ago

@chrisbeardy You already did some optimization with #292. Do you see room for more speed improvement or shall we close the issue?

chrisbeardy commented 2 years ago

There is definitely room for improvement here. I have a few ideas but limited time right now.

Using struct.unpack better is one. Instead of struck.upacking only one at a time we could do multiple if it's an array etc. That's just one.

Also I think there is potential to reorganise to allow unpacking more at a time too.

Leave it open for now.

stlehmann / pyads

optimise dict_from_bytes #289