Closed bkeryan closed 2 months ago
I suspect that this is a side effect of #767.
Cc: @dixonjoel @LazeringDeath
If I use pytest-profiling
to see where the test is spending the most time, it seems to pass (slowly!) with the "python" implementation of protobuf and fail often when using the "upb" implementation of protobuf.
Our old serialization code was based on the "python" implementation of protobuf.
PS D:\dev\measurement-plugin-python\packages\service> $env:PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='python'
PS D:\dev\measurement-plugin-python\packages\service> poetry run pytest -v -k test___measurement_service_v2___measure_with_large_array___returns_output[10000000] --profile
================================================= test session starts =================================================
platform win32 -- Python 3.8.10, pytest-8.3.1, pluggy-1.5.0 -- D:\dev\measurement-plugin-python\packages\service\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: D:\dev\measurement-plugin-python\packages\service
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0, profiling-1.7.0
collected 618 items / 617 deselected / 1 selected
tests/acceptance/test_measurement_service.py::test___measurement_service_v2___measure_with_large_array___returns_output[10000000] PASSED [100%]
Profiling (from D:\dev\measurement-plugin-python\packages\service\prof\combined.prof):
Tue Jul 30 18:28:40 2024 D:\dev\measurement-plugin-python\packages\service\prof\combined.prof
130024399 function calls (130023710 primitive calls) in 82.999 seconds
List reduced from 1007 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 82.999 82.999 runner.py:110(pytest_runtest_protocol)
1 0.000 0.000 82.998 82.998 runner.py:118(runtestprotocol)
3 0.000 0.000 82.998 27.666 runner.py:226(call_and_report)
28/11 0.000 0.000 82.998 7.545 _hooks.py:498(__call__)
28/11 0.000 0.000 82.998 7.545 _manager.py:111(_hookexec)
28/11 0.000 0.000 82.998 7.545 _callers.py:53(_multicall)
3 0.000 0.000 82.997 27.666 runner.py:242(<lambda>)
1 0.000 0.000 81.051 81.051 runner.py:163(pytest_runtest_call)
1 0.000 0.000 81.051 81.051 python.py:1625(runtest)
1 0.099 0.099 81.051 81.051 python.py:152(pytest_pyfunc_call)
1 0.032 0.032 80.952 80.952 test_measurement_service.py:173(test___measurement_service_v2___measure_with_large_array___returns_output)
2 0.037 0.019 46.016 23.008 test_measurement_service.py:220(_get_serialized_measurement_signature)
1 0.000 0.000 31.482 31.482 test_measurement_service.py:197(<listcomp>)
2 0.000 0.000 31.482 15.741 _channel.py:542(__next__)
2 0.000 0.000 31.482 15.741 _channel.py:936(_next)
2 0.001 0.000 31.482 15.741 _common.py:121(wait)
248 0.002 0.000 31.480 0.127 _common.py:111(_wait_once)
251 0.002 0.000 31.479 0.125 threading.py:270(wait)
509 31.475 0.062 31.475 0.062 {method 'acquire' of '_thread.lock' objects}
==================================== 1 passed, 617 deselected in 84.34s (0:01:24) =====================================
PS D:\dev\measurement-plugin-python\packages\service> git status
On branch main
Your branch is up to date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
prof/
nothing added to commit but untracked files present (use "git add" to track)
PS D:\dev\measurement-plugin-python\packages\service> poetry run pytest -v -k test___measurement_service_v2___measure_with_large_array___returns_output[10000000] --profile
================================================= test session starts =================================================
platform win32 -- Python 3.8.10, pytest-8.3.1, pluggy-1.5.0 -- D:\dev\measurement-plugin-python\packages\service\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: D:\dev\measurement-plugin-python\packages\service
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0, profiling-1.7.0
collected 618 items / 617 deselected / 1 selected
tests/acceptance/test_measurement_service.py::test___measurement_service_v2___measure_with_large_array___returns_output[10000000] PASSED [100%]
Profiling (from D:\dev\measurement-plugin-python\packages\service\prof\combined.prof):
Tue Jul 30 18:32:35 2024 D:\dev\measurement-plugin-python\packages\service\prof\combined.prof
130024388 function calls (130023699 primitive calls) in 83.250 seconds
Ordered by: cumulative time
List reduced from 1007 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 83.250 83.250 runner.py:110(pytest_runtest_protocol)
1 0.000 0.000 83.249 83.249 runner.py:118(runtestprotocol)
3 0.000 0.000 83.249 27.750 runner.py:226(call_and_report)
28/11 0.000 0.000 83.249 7.568 _hooks.py:498(__call__)
28/11 0.000 0.000 83.249 7.568 _manager.py:111(_hookexec)
28/11 0.000 0.000 83.249 7.568 _callers.py:53(_multicall)
3 0.000 0.000 83.248 27.749 runner.py:319(from_call)
3 0.000 0.000 83.248 27.749 runner.py:242(<lambda>)
1 0.000 0.000 81.195 81.195 runner.py:163(pytest_runtest_call)
1 0.000 0.000 81.195 81.195 python.py:1625(runtest)
1 0.105 0.105 81.195 81.195 python.py:152(pytest_pyfunc_call)
1 0.032 0.032 81.089 81.089 test_measurement_service.py:173(test___measurement_service_v2___measure_with_large_array___returns_output)
2 0.038 0.019 46.168 23.084 test_measurement_service.py:220(_get_serialized_measurement_signature)
1 0.000 0.000 31.516 31.516 test_measurement_service.py:197(<listcomp>)
2 0.000 0.000 31.516 15.758 _channel.py:542(__next__)
2 0.000 0.000 31.516 15.758 _channel.py:936(_next)
2 0.001 0.000 31.516 15.758 _common.py:121(wait)
247 0.002 0.000 31.515 0.128 _common.py:111(_wait_once)
250 0.002 0.000 31.514 0.126 threading.py:270(wait)
507 31.509 0.062 31.509 0.062 {method 'acquire' of '_thread.lock' objects}
==================================== 1 passed, 617 deselected in 84.58s (0:01:24) =====================================
PS D:\dev\measurement-plugin-python\packages\service> $env:PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='upb'
PS D:\dev\measurement-plugin-python\packages\service> poetry run pytest -v -k test___measurement_service_v2___measure_with_large_array___returns_output[10000000] --profile
================================================= test session starts =================================================
platform win32 -- Python 3.8.10, pytest-8.3.1, pluggy-1.5.0 -- D:\dev\measurement-plugin-python\packages\service\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: D:\dev\measurement-plugin-python\packages\service
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0, profiling-1.7.0
collected 618 items / 617 deselected / 1 selected
tests/acceptance/test_measurement_service.py::test___measurement_service_v2___measure_with_large_array___returns_output[10000000] FAILED [100%]
====================================================== FAILURES =======================================================
_________________ test___measurement_service_v2___measure_with_large_array___returns_output[10000000] _________________
double_array_len = 10000000
stub_v2 = <ni_measurement_plugin_sdk_service._internal.stubs.ni.measurementlink.measurement.v2.measurement_service_pb2_grpc.MeasurementServiceStub object at 0x040292B0>
@pytest.mark.parametrize("double_array_len", [10000, 100000, 1000000, 10000000]) # up to 80 MB
def test___measurement_service_v2___measure_with_large_array___returns_output(
double_array_len: int, stub_v2: v2_measurement_service_pb2_grpc.MeasurementServiceStub
):
float_in = 1.23
double_array_in = [random.random() for i in range(double_array_len)]
bool_in = False
enum_in = Color.BLUE
protobuf_enum_in = ProtobufColor.WHITE
string_in = "InputString"
string_array_in = ["", "TestString1", "#$%!@<*(&^~`"]
request = v2_measurement_service_pb2.MeasureRequest(
configuration_parameters=_get_configuration_parameters(
float_in,
double_array_in,
bool_in,
string_in,
enum_in,
protobuf_enum_in,
string_array_in,
)
)
response_iterator = stub_v2.Measure(request)
> responses = [response for response in response_iterator]
tests\acceptance\test_measurement_service.py:197:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests\acceptance\test_measurement_service.py:197: in <listcomp>
responses = [response for response in response_iterator]
.venv\lib\site-packages\grpc\_channel.py:543: in __next__
return self._next()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Exception iterating re...on iterating responses: Failed to serialize proto", grpc_status:2, created_time:"2024-07-30T23:34:36.9033441+00:00"}"
>
def _next(self) -> Any:
with self._state.condition:
if self._state.code is None:
event_handler = _event_handler(
self._state, self._response_deserializer
)
self._state.due.add(cygrpc.OperationType.receive_message)
operating = self._call.operate(
(cygrpc.ReceiveMessageOperation(_EMPTY_FLAGS),),
event_handler,
)
if not operating:
self._state.due.remove(cygrpc.OperationType.receive_message)
elif self._state.code is grpc.StatusCode.OK:
raise StopIteration()
else:
raise self
def _response_ready():
return self._state.response is not None or (
cygrpc.OperationType.receive_message not in self._state.due
and self._state.code is not None
)
_common.wait(self._state.condition.wait, _response_ready)
if self._state.response is not None:
response = self._state.response
self._state.response = None
return response
elif cygrpc.OperationType.receive_message not in self._state.due:
if self._state.code is grpc.StatusCode.OK:
raise StopIteration()
elif self._state.code is not None:
> raise self
E grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
E status = StatusCode.UNKNOWN
E details = "Exception iterating responses: Failed to serialize proto"
E debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:60413 {grpc_message:"Exception iterating responses: Failed to serialize proto", grpc_status:2, created_time:"2024-07-30T23:34:36.9033441+00:00"}"
E >
.venv\lib\site-packages\grpc\_channel.py:969: _MultiThreadedRendezvous
------------------------------------------------ Captured stderr setup ------------------------------------------------
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1722382442.492918 10376 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_client, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
-------------------------------------------------- Captured log call --------------------------------------------------
ERROR grpc._server:_server.py:668 Exception iterating responses: Failed to serialize proto
Traceback (most recent call last):
File "D:\dev\measurement-plugin-python\packages\service\.venv\lib\site-packages\grpc\_server.py", line 654, in _take_response_from_response_iterator
return next(response_iterator), True
File "D:\dev\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 381, in Measure
yield self._serialize_response(return_value)
File "D:\dev\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 388, in _serialize_response
outputs=_serialize_outputs(
File "D:\dev\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\grpc_servicer.py", line 144, in _serialize_outputs
value=encoder.serialize_parameters(output_metadata, outputs, service_name)
File "D:\dev\measurement-plugin-python\packages\service\ni_measurement_plugin_sdk_service\_internal\parameter\encoder.py", line 52, in serialize_parameters
return message_instance.SerializeToString()
google.protobuf.message.EncodeError: Failed to serialize proto
Profiling (from D:\dev\measurement-plugin-python\packages\service\prof\combined.prof):
Tue Jul 30 18:34:37 2024 D:\dev\measurement-plugin-python\packages\service\prof\combined.prof
10454063 function calls (10452264 primitive calls) in 35.460 seconds
Ordered by: cumulative time
List reduced from 1103 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 35.459 35.459 runner.py:118(runtestprotocol)
3 0.000 0.000 35.459 11.820 runner.py:226(call_and_report)
29/12 0.000 0.000 35.459 2.955 _hooks.py:498(__call__)
29/12 0.000 0.000 35.459 2.955 _manager.py:111(_hookexec)
29/12 0.000 0.000 35.459 2.955 _callers.py:53(_multicall)
3 0.000 0.000 34.969 11.656 runner.py:319(from_call)
3 0.000 0.000 34.969 11.656 runner.py:242(<lambda>)
1 0.000 0.000 32.967 32.967 runner.py:163(pytest_runtest_call)
1 0.000 0.000 32.967 32.967 python.py:1625(runtest)
1 0.000 0.000 32.967 32.967 python.py:152(pytest_pyfunc_call)
1 0.069 0.069 32.967 32.967 test_measurement_service.py:173(test___measurement_service_v2___measure_with_large_array___returns_output)
9 0.000 0.000 26.314 2.924 threading.py:270(wait)
25 26.314 1.053 26.314 1.053 {method 'acquire' of '_thread.lock' objects}
1 0.000 0.000 26.314 26.314 test_measurement_service.py:197(<listcomp>)
1 0.000 0.000 26.314 26.314 _channel.py:542(__next__)
1 0.000 0.000 26.314 26.314 _channel.py:936(_next)
1 0.000 0.000 26.313 26.313 _common.py:121(wait)
6 0.000 0.000 26.313 4.386 _common.py:111(_wait_once)
1 2.203 2.203 3.379 3.379 test_measurement_service.py:178(<listcomp>)
=============================================== short test summary info ===============================================
FAILED tests/acceptance/test_measurement_service.py::test___measurement_service_v2___measure_with_large_array___returns_output[10000000] - grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
========================================= 1 failed, 617 deselected in 36.79s ==========================================
32-bit builds of Python (e.g. 3.8.10-win32 in pyenv) do not have the IMAGE_FILE_LARGE_ADDRESS_AWARE flag set, so they are limited to 2 GB of virtual address space:
PS C:\Users\Administrator\.pyenv\pyenv-win\versions\3.8.10-win32> dumpbin /headers python.exe
...
102 characteristics
Executable
32 bit word machine
...
(If the flag was set, it would say "Application can handle large (>2GB) addresses".)
When I run this test with the "python" protobuf implementation, its commit charge slowly increases until it hits 1.1 GB, then drops back to about 530 MB. With "upb", its commit charge quickly increases to 1.1 GB, backs off to 1.0x GB, then jumps to 1.2 GB and stays there until the test fails.
I'll post a PR to skip this test for 32-bit Windows since it's blocking other PRs.
Bug Report
test___measurement_service_v2___measure_with_large_array___returns_output[10000000]
with 32-bit Python version 3.8 and 3.9 have been failing a lot lately with out-of-memory errors.Repro or Code Sample
Existing test
Expected Behavior
It passes.
Current Behavior
https://github.com/ni/measurement-plugin-python/actions/runs/10167541881/job/28125984987?pr=815
Possible Solution
Context
Your Environment
ni-measurement-plugin-sdk-service
version [e.g. 2.0.0]AB#2817166