ni / measurement-plugin-python

Python framework to develop measurement plug-ins for NI application software. Contains sample measurement plug-ins for InstrumentStudio and TestStand.
MIT License
19 stars 15 forks source link

32-bit Python tests fail with out-of-memory errors #818

Closed bkeryan closed 2 months ago

bkeryan commented 3 months ago

Bug Report

test___measurement_service_v2___measure_with_large_array___returns_output[10000000] with 32-bit Python version 3.8 and 3.9 have been failing a lot lately with out-of-memory errors.

Repro or Code Sample

Existing test

Expected Behavior

It passes.

Current Behavior

https://github.com/ni/measurement-plugin-python/actions/runs/10167541881/job/28125984987?pr=815

_ test___measurement_service_v2___measure_with_large_array___returns_output[10000000] _

double_array_len = 10000000
stub_v2 = <ni_measurement_plugin_sdk_service._internal.stubs.ni.measurementlink.measurement.v2.measurement_service_pb2_grpc.MeasurementServiceStub object at 0x077AF628>

    @pytest.mark.parametrize("double_array_len", [10000, 100000, 1000000, 10000000])  # up to 80 MB
    def test___measurement_service_v2___measure_with_large_array___returns_output(
        double_array_len: int, stub_v2: v2_measurement_service_pb2_grpc.MeasurementServiceStub
    ):
        float_in = 1.23
        double_array_in = [random.random() for i in range(double_array_len)]
        bool_in = False
        enum_in = Color.BLUE
        protobuf_enum_in = ProtobufColor.WHITE
        string_in = "InputString"
        string_array_in = ["", "TestString1", "#$%!@<*(&^~`"]

        request = v2_measurement_service_pb2.MeasureRequest(
            configuration_parameters=_get_configuration_parameters(
                float_in,
                double_array_in,
                bool_in,
                string_in,
                enum_in,
                protobuf_enum_in,
                string_array_in,
            )
        )
        response_iterator = stub_v2.Measure(request)
>       responses = [response for response in response_iterator]

tests\acceptance\test_measurement_service.py:197: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests\acceptance\test_measurement_service.py:197: in <listcomp>
    responses = [response for response in response_iterator]
.tox\py38-all-extras\lib\site-packages\grpc\_channel.py:543: in __next__
    return self._next()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_MultiThreadedRendezvous of RPC that terminated with:
    status = StatusCode.UNKNOWN
    details = "Exception iterating re...76 {created_time:"2024-07-30T21:14:12.[725](https://github.com/ni/measurement-plugin-python/actions/runs/10167541881/job/28125984987?pr=815#step:6:726)0196+00:00", grpc_status:2, grpc_message:"Exception iterating responses: "}"
>

    def _next(self) -> Any:
        with self._state.condition:
            if self._state.code is None:
                event_handler = _event_handler(
                    self._state, self._response_deserializer
                )
                self._state.due.add(cygrpc.OperationType.receive_message)
                operating = self._call.operate(
                    (cygrpc.ReceiveMessageOperation(_EMPTY_FLAGS),),
                    event_handler,
                )
                if not operating:
                    self._state.due.remove(cygrpc.OperationType.receive_message)
            elif self._state.code is grpc.StatusCode.OK:
                raise StopIteration()
            else:
                raise self

            def _response_ready():
                return self._state.response is not None or (
                    cygrpc.OperationType.receive_message not in self._state.due
                    and self._state.code is not None
                )

            _common.wait(self._state.condition.wait, _response_ready)
            if self._state.response is not None:
                response = self._state.response
                self._state.response = None
                return response
            elif cygrpc.OperationType.receive_message not in self._state.due:
                if self._state.code is grpc.StatusCode.OK:
                    raise StopIteration()
                elif self._state.code is not None:
>                   raise self
E                   grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
E                       status = StatusCode.UNKNOWN
E                       details = "Exception iterating responses: "
E                       debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:56476 {created_time:"2024-07-30T21:14:12.7250196+00:00", grpc_status:2, grpc_message:"Exception iterating responses: "}"
E                   >

.tox\py38-all-extras\lib\site-packages\grpc\_channel.py:969: _MultiThreadedRendezvous
------------------------------ Captured log call ------------------------------
ERROR    grpc._server:_server.py:668 Exception iterating responses: 
Traceback (most recent call last):
  File "C:\actions-runner\1\_work\measurement-plugin-python\measurement-plugin-python\packages\service\.tox\py38-all-extras\lib\site-packages\grpc\_server.py", line 654, in _take_response_from_response_iterator
    return next(response_iterator), True
  File "C:\actions-runner\1\_work\measurement-plugin-python\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 381, in Measure
    yield self._serialize_response(return_value)
  File "C:\actions-runner\1\_work\measurement-plugin-python\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 388, in _serialize_response
    outputs=_serialize_outputs(
  File "C:\actions-runner\1\_work\measurement-plugin-python\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 144, in _serialize_outputs
    value=encoder.serialize_parameters(output_metadata, outputs, service_name)
  File "C:\actions-runner\1\_work\measurement-plugin-python\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\parameter\encoder.py", line 52, in serialize_parameters
    return message_instance.SerializeToString()
MemoryError

Possible Solution

Context

Your Environment

AB#2817166

bkeryan commented 3 months ago

I suspect that this is a side effect of #767.

Cc: @dixonjoel @LazeringDeath

If I use pytest-profiling to see where the test is spending the most time, it seems to pass (slowly!) with the "python" implementation of protobuf and fail often when using the "upb" implementation of protobuf.

Our old serialization code was based on the "python" implementation of protobuf.

PS D:\dev\measurement-plugin-python\packages\service> $env:PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='python'
PS D:\dev\measurement-plugin-python\packages\service> poetry run pytest -v -k test___measurement_service_v2___measure_with_large_array___returns_output[10000000] --profile
================================================= test session starts =================================================
platform win32 -- Python 3.8.10, pytest-8.3.1, pluggy-1.5.0 -- D:\dev\measurement-plugin-python\packages\service\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: D:\dev\measurement-plugin-python\packages\service
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0, profiling-1.7.0
collected 618 items / 617 deselected / 1 selected

tests/acceptance/test_measurement_service.py::test___measurement_service_v2___measure_with_large_array___returns_output[10000000] PASSED [100%]
Profiling (from D:\dev\measurement-plugin-python\packages\service\prof\combined.prof):
Tue Jul 30 18:28:40 2024    D:\dev\measurement-plugin-python\packages\service\prof\combined.prof

         130024399 function calls (130023710 primitive calls) in 82.999 seconds

   List reduced from 1007 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   82.999   82.999 runner.py:110(pytest_runtest_protocol)
        1    0.000    0.000   82.998   82.998 runner.py:118(runtestprotocol)
        3    0.000    0.000   82.998   27.666 runner.py:226(call_and_report)
    28/11    0.000    0.000   82.998    7.545 _hooks.py:498(__call__)
    28/11    0.000    0.000   82.998    7.545 _manager.py:111(_hookexec)
    28/11    0.000    0.000   82.998    7.545 _callers.py:53(_multicall)
        3    0.000    0.000   82.997   27.666 runner.py:242(<lambda>)
        1    0.000    0.000   81.051   81.051 runner.py:163(pytest_runtest_call)
        1    0.000    0.000   81.051   81.051 python.py:1625(runtest)
        1    0.099    0.099   81.051   81.051 python.py:152(pytest_pyfunc_call)
        1    0.032    0.032   80.952   80.952 test_measurement_service.py:173(test___measurement_service_v2___measure_with_large_array___returns_output)
        2    0.037    0.019   46.016   23.008 test_measurement_service.py:220(_get_serialized_measurement_signature)
        1    0.000    0.000   31.482   31.482 test_measurement_service.py:197(<listcomp>)
        2    0.000    0.000   31.482   15.741 _channel.py:542(__next__)
        2    0.000    0.000   31.482   15.741 _channel.py:936(_next)
        2    0.001    0.000   31.482   15.741 _common.py:121(wait)
      248    0.002    0.000   31.480    0.127 _common.py:111(_wait_once)
      251    0.002    0.000   31.479    0.125 threading.py:270(wait)
      509   31.475    0.062   31.475    0.062 {method 'acquire' of '_thread.lock' objects}

==================================== 1 passed, 617 deselected in 84.34s (0:01:24) =====================================
PS D:\dev\measurement-plugin-python\packages\service> git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        prof/

nothing added to commit but untracked files present (use "git add" to track)
PS D:\dev\measurement-plugin-python\packages\service> poetry run pytest -v -k test___measurement_service_v2___measure_with_large_array___returns_output[10000000] --profile
================================================= test session starts =================================================
platform win32 -- Python 3.8.10, pytest-8.3.1, pluggy-1.5.0 -- D:\dev\measurement-plugin-python\packages\service\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: D:\dev\measurement-plugin-python\packages\service
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0, profiling-1.7.0
collected 618 items / 617 deselected / 1 selected

tests/acceptance/test_measurement_service.py::test___measurement_service_v2___measure_with_large_array___returns_output[10000000] PASSED [100%]
Profiling (from D:\dev\measurement-plugin-python\packages\service\prof\combined.prof):
Tue Jul 30 18:32:35 2024    D:\dev\measurement-plugin-python\packages\service\prof\combined.prof

         130024388 function calls (130023699 primitive calls) in 83.250 seconds

   Ordered by: cumulative time
   List reduced from 1007 to 20 due to restriction <20>
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   83.250   83.250 runner.py:110(pytest_runtest_protocol)
        1    0.000    0.000   83.249   83.249 runner.py:118(runtestprotocol)
        3    0.000    0.000   83.249   27.750 runner.py:226(call_and_report)
    28/11    0.000    0.000   83.249    7.568 _hooks.py:498(__call__)
    28/11    0.000    0.000   83.249    7.568 _manager.py:111(_hookexec)
    28/11    0.000    0.000   83.249    7.568 _callers.py:53(_multicall)
        3    0.000    0.000   83.248   27.749 runner.py:319(from_call)
        3    0.000    0.000   83.248   27.749 runner.py:242(<lambda>)
        1    0.000    0.000   81.195   81.195 runner.py:163(pytest_runtest_call)
        1    0.000    0.000   81.195   81.195 python.py:1625(runtest)
        1    0.105    0.105   81.195   81.195 python.py:152(pytest_pyfunc_call)
        1    0.032    0.032   81.089   81.089 test_measurement_service.py:173(test___measurement_service_v2___measure_with_large_array___returns_output)
        2    0.038    0.019   46.168   23.084 test_measurement_service.py:220(_get_serialized_measurement_signature)
        1    0.000    0.000   31.516   31.516 test_measurement_service.py:197(<listcomp>)
        2    0.000    0.000   31.516   15.758 _channel.py:542(__next__)
        2    0.000    0.000   31.516   15.758 _channel.py:936(_next)
        2    0.001    0.000   31.516   15.758 _common.py:121(wait)
      247    0.002    0.000   31.515    0.128 _common.py:111(_wait_once)
      250    0.002    0.000   31.514    0.126 threading.py:270(wait)
      507   31.509    0.062   31.509    0.062 {method 'acquire' of '_thread.lock' objects}

==================================== 1 passed, 617 deselected in 84.58s (0:01:24) =====================================
PS D:\dev\measurement-plugin-python\packages\service> $env:PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='upb'
PS D:\dev\measurement-plugin-python\packages\service> poetry run pytest -v -k test___measurement_service_v2___measure_with_large_array___returns_output[10000000] --profile
================================================= test session starts =================================================
platform win32 -- Python 3.8.10, pytest-8.3.1, pluggy-1.5.0 -- D:\dev\measurement-plugin-python\packages\service\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: D:\dev\measurement-plugin-python\packages\service
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0, profiling-1.7.0
collected 618 items / 617 deselected / 1 selected

tests/acceptance/test_measurement_service.py::test___measurement_service_v2___measure_with_large_array___returns_output[10000000] FAILED [100%]

====================================================== FAILURES =======================================================
_________________ test___measurement_service_v2___measure_with_large_array___returns_output[10000000] _________________

double_array_len = 10000000
stub_v2 = <ni_measurement_plugin_sdk_service._internal.stubs.ni.measurementlink.measurement.v2.measurement_service_pb2_grpc.MeasurementServiceStub object at 0x040292B0>

    @pytest.mark.parametrize("double_array_len", [10000, 100000, 1000000, 10000000])  # up to 80 MB
    def test___measurement_service_v2___measure_with_large_array___returns_output(
        double_array_len: int, stub_v2: v2_measurement_service_pb2_grpc.MeasurementServiceStub
    ):
        float_in = 1.23
        double_array_in = [random.random() for i in range(double_array_len)]
        bool_in = False
        enum_in = Color.BLUE
        protobuf_enum_in = ProtobufColor.WHITE
        string_in = "InputString"
        string_array_in = ["", "TestString1", "#$%!@<*(&^~`"]

        request = v2_measurement_service_pb2.MeasureRequest(
            configuration_parameters=_get_configuration_parameters(
                float_in,
                double_array_in,
                bool_in,
                string_in,
                enum_in,
                protobuf_enum_in,
                string_array_in,
            )
        )
        response_iterator = stub_v2.Measure(request)
>       responses = [response for response in response_iterator]

tests\acceptance\test_measurement_service.py:197:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests\acceptance\test_measurement_service.py:197: in <listcomp>
    responses = [response for response in response_iterator]
.venv\lib\site-packages\grpc\_channel.py:543: in __next__
    return self._next()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Exception iterating re...on iterating responses: Failed to serialize proto", grpc_status:2, created_time:"2024-07-30T23:34:36.9033441+00:00"}"
>

    def _next(self) -> Any:
        with self._state.condition:
            if self._state.code is None:
                event_handler = _event_handler(
                    self._state, self._response_deserializer
                )
                self._state.due.add(cygrpc.OperationType.receive_message)
                operating = self._call.operate(
                    (cygrpc.ReceiveMessageOperation(_EMPTY_FLAGS),),
                    event_handler,
                )
                if not operating:
                    self._state.due.remove(cygrpc.OperationType.receive_message)
            elif self._state.code is grpc.StatusCode.OK:
                raise StopIteration()
            else:
                raise self

            def _response_ready():
                return self._state.response is not None or (
                    cygrpc.OperationType.receive_message not in self._state.due
                    and self._state.code is not None
                )

            _common.wait(self._state.condition.wait, _response_ready)
            if self._state.response is not None:
                response = self._state.response
                self._state.response = None
                return response
            elif cygrpc.OperationType.receive_message not in self._state.due:
                if self._state.code is grpc.StatusCode.OK:
                    raise StopIteration()
                elif self._state.code is not None:
>                   raise self
E                   grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
E                       status = StatusCode.UNKNOWN
E                       details = "Exception iterating responses: Failed to serialize proto"
E                       debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:60413 {grpc_message:"Exception iterating responses: Failed to serialize proto", grpc_status:2, created_time:"2024-07-30T23:34:36.9033441+00:00"}"
E                   >

.venv\lib\site-packages\grpc\_channel.py:969: _MultiThreadedRendezvous
------------------------------------------------ Captured stderr setup ------------------------------------------------
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1722382442.492918   10376 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_client, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
-------------------------------------------------- Captured log call --------------------------------------------------
ERROR    grpc._server:_server.py:668 Exception iterating responses: Failed to serialize proto
Traceback (most recent call last):
  File "D:\dev\measurement-plugin-python\packages\service\.venv\lib\site-packages\grpc\_server.py", line 654, in _take_response_from_response_iterator
    return next(response_iterator), True
  File "D:\dev\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 381, in Measure
    yield self._serialize_response(return_value)
  File "D:\dev\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 388, in _serialize_response
    outputs=_serialize_outputs(
  File "D:\dev\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 144, in _serialize_outputs
    value=encoder.serialize_parameters(output_metadata, outputs, service_name)
  File "D:\dev\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\parameter\encoder.py", line 52, in serialize_parameters
    return message_instance.SerializeToString()
google.protobuf.message.EncodeError: Failed to serialize proto
Profiling (from D:\dev\measurement-plugin-python\packages\service\prof\combined.prof):
Tue Jul 30 18:34:37 2024    D:\dev\measurement-plugin-python\packages\service\prof\combined.prof

         10454063 function calls (10452264 primitive calls) in 35.460 seconds

   Ordered by: cumulative time
   List reduced from 1103 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   35.459   35.459 runner.py:118(runtestprotocol)
        3    0.000    0.000   35.459   11.820 runner.py:226(call_and_report)
    29/12    0.000    0.000   35.459    2.955 _hooks.py:498(__call__)
    29/12    0.000    0.000   35.459    2.955 _manager.py:111(_hookexec)
    29/12    0.000    0.000   35.459    2.955 _callers.py:53(_multicall)
        3    0.000    0.000   34.969   11.656 runner.py:319(from_call)
        3    0.000    0.000   34.969   11.656 runner.py:242(<lambda>)
        1    0.000    0.000   32.967   32.967 runner.py:163(pytest_runtest_call)
        1    0.000    0.000   32.967   32.967 python.py:1625(runtest)
        1    0.000    0.000   32.967   32.967 python.py:152(pytest_pyfunc_call)
        1    0.069    0.069   32.967   32.967 test_measurement_service.py:173(test___measurement_service_v2___measure_with_large_array___returns_output)
        9    0.000    0.000   26.314    2.924 threading.py:270(wait)
       25   26.314    1.053   26.314    1.053 {method 'acquire' of '_thread.lock' objects}
        1    0.000    0.000   26.314   26.314 test_measurement_service.py:197(<listcomp>)
        1    0.000    0.000   26.314   26.314 _channel.py:542(__next__)
        1    0.000    0.000   26.314   26.314 _channel.py:936(_next)
        1    0.000    0.000   26.313   26.313 _common.py:121(wait)
        6    0.000    0.000   26.313    4.386 _common.py:111(_wait_once)
        1    2.203    2.203    3.379    3.379 test_measurement_service.py:178(<listcomp>)

=============================================== short test summary info ===============================================
FAILED tests/acceptance/test_measurement_service.py::test___measurement_service_v2___measure_with_large_array___returns_output[10000000] - grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
========================================= 1 failed, 617 deselected in 36.79s ==========================================
bkeryan commented 3 months ago

32-bit builds of Python (e.g. 3.8.10-win32 in pyenv) do not have the IMAGE_FILE_LARGE_ADDRESS_AWARE flag set, so they are limited to 2 GB of virtual address space:

PS C:\Users\Administrator\.pyenv\pyenv-win\versions\3.8.10-win32> dumpbin /headers python.exe
...
             102 characteristics
                   Executable
                   32 bit word machine
...

(If the flag was set, it would say "Application can handle large (>2GB) addresses".)

When I run this test with the "python" protobuf implementation, its commit charge slowly increases until it hits 1.1 GB, then drops back to about 530 MB. With "upb", its commit charge quickly increases to 1.1 GB, backs off to 1.0x GB, then jumps to 1.2 GB and stays there until the test fails.

I'll post a PR to skip this test for 32-bit Windows since it's blocking other PRs.