ni / nimi-python

Python bindings for NI Modular Instrument drivers.
Other
112 stars 84 forks source link

niscope.Session.fetch() is very slow with large fetches #1997

Closed ni-jfitzger closed 1 year ago

ni-jfitzger commented 1 year ago

Description of issue

Fetching 250M samples acquired at full rate appears to take 13-14 seconds with niscope.Session.fetch() with a PXIe-5164.

Environment

niscope Python package version: 1.4.5 (though the performance will likely be the same for any version from the last few years) Platform Tested on: Windows

ni-jfitzger commented 1 year ago

The time that it takes to fetch (in this instance) is almost entirely allocation time. niscope.Session.fetch() allocates an array.array for the user and then fetches data into it. The initializer of the array looks like this:

[0] * wfm_size.

Every element gets initialized to 0. This is a technically unnecessary operation, but we're not sure how to allocate the array without initializing elements. As the size of the array grows, performance drops, and once the size gets very large, performance falls off a cliff.

Users have another option: fetch_into(). They can preallocate a numpy array and fetch into it. numpy allocation is much, much faster.

#!/usr/bin/python
# python_allocation_time_test.py
import array
import numpy
import time

if __name__ == "__main__":
    record_length = 250000000
    total_num_wfms = 1
    t1 = time.time()
    array.array("d", [0] * record_length * total_num_wfms)
    t2 = time .time()
    wfm = numpy.ndarray(total_num_wfms * record_length, dtype=numpy.float64)
    t3 = time.time()

    print(f"array alloc time: {t2-t1}")
    print(f"numpy alloc time: {t3-t2}")
>python python_allocation_time_test.py
array alloc time: 13.71068263053894
numpy alloc time: 0.0
ni-jfitzger commented 1 year ago

We don't want to add a dependency on numpy and we don't know of a way to improve the performance with array.array, so we currently don't have any plans to fix this. We might periodically check for a way to allocate the array faster. If you know of a way, please let us know.

bkeryan commented 1 year ago

It's not just that it initializes the array to zero, it's that it allocates a temporary list and copies it into the array.

My first inclination is to use itertools.repeat, but this is actually slower. The solution is surprising: use the multiplication operator on a single-element array.

% python3 -m timeit -s 'import array' 'array.array("d", [0] * 10000000)'
1 loop, best of 5: 251 msec per loop
% python3 -m timeit -s 'import array, itertools' 'array.array("d", itertools.repeat(0, 10000000))'
1 loop, best of 5: 450 msec per loop
% python3 -m timeit -s 'import array' 'array.array("d", [0]) * 10000000'                          
50 loops, best of 5: 6.31 msec per loop
ni-jfitzger commented 1 year ago

It's not just that it initializes the array to zero, it's that it allocates a temporary list and copies it into the array.

My first inclination is to use itertools.repeat, but this is actually slower. The solution is surprising: use the multiplication operator on a single-element array.

% python3 -m timeit -s 'import array' 'array.array("d", [0] * 10000000)'
1 loop, best of 5: 251 msec per loop
% python3 -m timeit -s 'import array, itertools' 'array.array("d", itertools.repeat(0, 10000000))'
1 loop, best of 5: 450 msec per loop
% python3 -m timeit -s 'import array' 'array.array("d", [0]) * 10000000'                          
50 loops, best of 5: 6.31 msec per loop

@bkeryan that's amazing. With that method, we can allocate a 250M sample array in 10% of the time. numpy array allocation is still much faster, but this could offer a tremendous performance improvement.

I'll reopen this.

ni-jfitzger commented 1 year ago

I believe we've improved the efficiency of array allocation as much as we can. Any further improvements will likely need to come from Python, itself. Closing this.