ni / nimi-python

Python bindings for NI Modular Instrument drivers.
Other
111 stars 84 forks source link

Some system tests take a very long time to run #1264

Open marcoskirsch opened 4 years ago

marcoskirsch commented 4 years ago

nimi-bot system tests now run in parallel, with a job for each bitness (win32 vs win64) and for each module (nidmm, niscope, nidcpower, nise, etc.)

I noticed some jobs consistently take a very long time compared to the rest. Take the system tests run as part of #1258:

system-tests/win32/nidcpower  Took 2 min 22 sec on nimibot-win-32-2
system-tests/win32/nidigital  Took 2 min 5 sec on nimibot-win-32-1
system-tests/win32/nidmm      Took 2 min 59 sec on nimibot-win-32-2
system-tests/win32/nifgen     Took 3 min 48 sec on nimibot-win-32-2
system-tests/win32/nimodinst  Took 1 min 59 sec on nimibot-win-32-1
system-tests/win32/niscope    Took 3 min 40 sec on nimibot-win-32-1
system-tests/win32/nise       Took 2 min 39 sec on nimibot-win-32-1
system-tests/win32/niswitch   Took 8 min 58 sec on nimibot-win-32-2
system-tests/win32/nitclk     Took 12 sec on nimibot-win-32-1
system-tests/win64/nidcpower  Took 2 min 31 sec on nimibot-win-64-1
system-tests/win64/nidigital  Took 3 min 1 sec on nimibot-win-64-1
system-tests/win64/nidmm      Took 2 min 47 sec on nimibot-win-64-1
system-tests/win64/nifgen     Took 4 min 24 sec on nimibot-win-64-1
system-tests/win64/nimodinst  Took 2 min 20 sec on nimibot-win-64-2
system-tests/win64/niscope    Took 3 min 37 sec on nimibot-win-64-2
system-tests/win64/nise       Took 2 min 49 sec on nimibot-win-64-2
system-tests/win64/niswitch   Took 9 min 35 sec on nimibot-win-64-2
system-tests/win64/nitclk     Took 14 sec on nimibot-win-64-1

By far the worse offender is niswitch which takes over twice as long as the rest of them. We really need to look into what's causing it and fix it.

marcoskirsch commented 4 years ago

The most likely reason I see of why niswitch and nifgen are pretty slow compared to the rest is that they open simulated sessions to devices whose driver runtime uses the DAQmx framework under the hood.

This is:

System tests for each job runs in a separate agent, but we are running tests for 5 versions of Python in parallel. That would be a lot of multi-second tests gettting serialized!

The mechanism in which tests that use session-based simulation with DAQmx framework devices are serialized is a global lock we call daqmx_sim_db_lock. Just grep for that keyword.

This is only used in src/nifgen/system_tests/test_system_nifgen.py and src/niswitch/system_tests/test_system_niswitch.py.

So a potential way to make these tests faster are:

marcoskirsch commented 4 years ago

I looked at the nimi-bot logs from PR #1268. For example, in job win32/job/niswitch/274/console there are some obvious offenders:

========================== slowest 5 test durations ===========================
223.97s call     src/niswitch/system_tests/test_system_niswitch.py::test_continuous_software_scanning
153.20s call     src/niswitch/system_tests/test_system_niswitch.py::test_enum_attribute
0.12s setup    src/niswitch/system_tests/test_system_niswitch.py::test_relayclose
0.08s call     src/niswitch/system_tests/test_system_niswitch.py::test_write_only_attribute
0.07s call     src/niswitch/system_tests/test_system_niswitch.py::test_error_message

As expected, those are tests that use session-based simulation of a device that uses the DAQmx driver framework. But they should really not so long. One mistake is that the global lock we introduced is only needed during Session creation and destruction / close. But we hold on to it for the duration of the test.

We should have a goal that no test takes over 1 second.

tobiasgordon commented 3 years ago

The intermittent failure due to simulated device creation race conditions is being tracked in the internal NI Bug #AB1235678. Fixing that should also improve the test times of those that use session-based simulation of devices using the DAQmx driver framework.

marcoskirsch commented 2 years ago

Following data based on latest runs using new nimibot system (AWS hosted VMs + GitHub Actions). Worst offenders are:

This are the ones we should focus on optimizing because they are the long poles at the moment.