Closed ZLLentz closed 2 years ago
Current WIP text output:
I've cleaned up the help text, I think to finish this off I need to add some adequate test suite cases. Once those are in place I'll ask for reviews.
I added some tests that cycle through the various valid input args. It's hard to test the output more specifically because it is hardware dependent and not deterministic.
I'm somewhat worried that there are some stylistic/implementation inconsistencies between these two cli functions written on separate days. Hopefully it isn't too bad.
Looks like we need to add pcdsutils
to the pip requirements?
ah yeah absolutely need line_profiler/pcdsutils to run the tests I'm surprisingly happy that it failed in an understandable way
Strangely, the existing tests just skip stuff like this on the CI:
happi/tests/test_backends.py::test_qsbackend_with_client SKIPPED (Missing pcdsdevices) [ 8%]
happi/tests/test_backends.py::test_qsbackend_with_acromag SKIPPED (Missing pcdsdevices) [ 9%]
happi/tests/test_backends.py::test_beckoff_axis_device_class SKIPPED (Missing pcdsdevices) [ 10%]
It's slightly different of course because pcdsdevices
isn't even an optional dep of happi
, it's just a place where our device classes live. I want to make the tests with optional dependencies skippable but also make sure we get down to zero skips in the final suite run.
I like this a lot. In my brief test drives, I get segmentation faults at the end of both the benchmark
and profile
commands. In some cases this doesn't actually interrupt the profiling.
happi profile -a at2l0
will segfault after printing the profiler output, along with a large exception traceback.
happi benchmark at2l0 --duration 5
will segfault at the end with no exception traceback
Observed exceptions include:
RuntimeError: Expected CA context is unset
...more?
@tangkong Those are pyepics/ophyd-related Channel Access context teardown bugs. We had experienced the same thing in atef, leading to this: https://github.com/pcdshub/atef/blob/1e9365d734feb3b9ea1195b0b688bd4c2d895b2a/atef/util.py#L17-L21
Ideally, ophyd would handle this for us correctly and we wouldn't have to intervene. The fallback would be handling that here, in happi. Given that happi doesn't have an ophyd dependency, I'm not sure of the best path forward.
Bizarre that I didn't see those at all- I'll switch to testing on linux so I catch those myself?
I see them on macOS - I forget if they're an issue on Linux or not
I saw them on psbuild-rhel7, fwiw
Ideally, ophyd would handle this for us correctly and we wouldn't have to intervene. The fallback would be handling that here, in happi. Given that happi doesn't have an ophyd dependency, I'm not sure of the best path forward.
If the ophyd
import works I'll clean it up as done in your link- or better, if ophyd
has already been imported, available or not.
The snippit from atef
resolved the big output from benchmark
and I'm no longer seeing segfaults, but there are still oddities in the profile
post-output, namely this traceback repeated. I'll start tracking it down.
Exception ignored on calling ctypes callback function: <function _onConnectionEvent at 0x7fa39d2b5c10>
Traceback (most recent call last):
File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.1/lib/python3.9/site-packages/epics/ca.py", line 721, in _onConnectionEvent
entry.run_connection_callbacks(conn=(args.op == dbr.OP_CONN_UP),
File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.1/lib/python3.9/site-packages/epics/ca.py", line 267, in run_connection_callbacks
callback(pvname=self.pvname, chid=chid_int, conn=self.conn)
File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.1/lib/python3.9/site-packages/epics/pv.py", line 47, in wrapped
return func(self, *args, **kwargs)
File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.1/lib/python3.9/site-packages/epics/pv.py", line 324, in __on_connect
conn_cb(pvname=self.pvname, conn=conn, pv=self)
File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.1/lib/python3.9/site-packages/ophyd/signal.py", line 976, in _pv_connected
self._add_callback(pvname, pv, self._read_changed)
File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.1/lib/python3.9/site-packages/ophyd/signal.py", line 1032, in _add_callback
mon = pv.add_callback(cb,
File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.1/lib/python3.9/site-packages/ophyd/_pyepics_shim.py", line 69, in add_callback
callback = wrap_callback(_dispatcher, 'monitor', callback)
File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.1/lib/python3.9/site-packages/ophyd/_dispatch.py", line 204, in wrap_callback
assert event_type in dispatcher._threads
AssertionError:
Exception ignored on calling ctypes callback function: <function _onConnectionEvent at 0x7fa39d2b5c10>
The above error spam occurs because PVs with auto_monitor=True
are connecting in the gap between calling the ophyd dispatcher shutdown code and the termination of the program. This auto monitor flag is implemented as a connection callback to start the monitor request. The connection callback runs through the shim, where it hits the wrap_callback
utility from the ophyd._dispatch
module, which compares the event type to the dispatcher threads which fails because these threads are now gone.
In principle, any connection callback that raises an exception at teardown could show up as terminals spam here if timed poorly. The spam race condition.
Note that the pyepics shim has its own attempted cleanup at exit which seems to be insufficient- might be worth making an ophyd PR once all the edges are understood here: https://github.com/bluesky/ophyd/blob/d5fc722eef4d3d83845b1d523004302ec3aadb78/ophyd/_pyepics_shim.py#L150-L165
I think I've addressed all items here
I'd like to merge this tomorrow morning if it still looks good. I'll use it tomorrow to try to implement speedups to pcdsdevices/happi.
Looks good to me. Output is mostly clean, if I try to profile all the devices I'll get ophyd timeout warnings, but individually they're fine. (I caught at2k2_calc
and xcs_lodcm
)
I imagine there's only so much we can do to catch errors/timeouts, and I think this covers a lot of the bases
I'm happy to merge this and start using it more
Description
happi benchmark
for identifying which beamline devices are slow to loadhappi profile
for identifying why particular beamline devices are slow to loadMotivation and Context
I'm on a quest to improve the loading performance of our specific ophyd devices at LCLS. I want to build these tools into
happi
so that it becomes easy to identify when an arbitrary device is taking a long time to load.How Has This Been Tested?
Interactively + some basic argument tests to make sure the benchmark and profile can at least complete.
Where Has This Been Documented?
N/A