tango-controls-hdbpp / hdbpp-es

Tango device server for the HDB++ Event Subscriber. Moved to https://gitlab.com/tango-controls/hdbpp/hdbpp-es
http://www.tango-controls.org/community/projects/hdbplus
1 stars 7 forks source link

AttributeRemove freezes #25

Closed jjdmol closed 2 years ago

jjdmol commented 2 years ago

When we remove an attribute from the eventsubscriber, we notice hdbppes-srv freezing up immediately (becoming unresponsive towards Tango), see the log below. The SharedData::put_signal_property function is entered, but is never exited.

I don't have access to debug symbols, making debugging hard, but by code inspection, it seems that perhaps in SharedData:

hdbppes-srv debug log after calling es_proxy.attributeremove() from Python:

1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: unsubscribing ARCHIVE_EVENT... tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 In get_monitor() unknown, thread = 11, ctr = 0
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 In rel_monitor() unknown, ctr = 1, thread = 11
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 Signalling !
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: unsubscribed ARCHIVE_EVENT... tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: unsubscribing ATTR_CONF_EVENT... tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 In get_monitor() unknown, thread = 11, ctr = 0
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 In rel_monitor() unknown, ctr = 1, thread = 11
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 Signalling !
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: unsubscribed ATTR_CONF_EVENT... tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: removing tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: stopped tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: removed tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SubscribeThread::remove: going to increase action... action=0++
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 Leaving DeviceClass::command_handler() method
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 SubDevDiag::set_associated_device() entering ...
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: stopped tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::remove: removed tango://databaseds.tangonet:10000/stat/sdp/1/fpga_error_r
1649265497 [140699462772480] DEBUG archiving/hdbppts/eventsubscriber01 SubscribeThread::remove: going to increase action... action=0++
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 Leaving DeviceClass::command_handler() method
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 SubDevDiag::set_associated_device() entering ...
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 DeviceImpl::command_inout(): leaving method for command attributeremove
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 In rel_monitor() archiving/hdbppts/eventsubscriber01, ctr = 2, thread = 11
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 In rel_monitor() archiving/hdbppts/eventsubscriber01, ctr = 1, thread = 11
1649265497 [140699462772480] DEBUG dserver/hdbppes-srv/01 Signalling !
1649265497 [140699479557888] DEBUG archiving/hdbppts/eventsubscriber01 run_undetached: AWAKE
1649265497 [140699479557888] DEBUG archiving/hdbppts/eventsubscriber01 SharedData::put_signal_property: put_signal_property entering action=1
dlacoste-esrf commented 2 years ago

Thanks for the PR on this one. My current approach was to make deeper changes to fix this one, the branch remove_polling_threads is actually where my fix lives. It works but is much more difficult to track the changes. You've tested your patch and it fixes the deadlock I guess ? if so I'll merge it and we can check what to do with my branch later. Thanks again!

jjdmol commented 2 years ago

Apologies. Unfortunately I haven't a setup to test this yet, as I use the SKA docker images for hdbpp, which are high level. While there is good reason to assume it works (it seems this code was accidentally removed at some point long ago), it's definitely a good idea to check.

I could look at building such a setup, but was hoping it's trivial to see by those who already have one...

dlacoste-esrf commented 2 years ago

no apologies necessary, we'll check!

jjdmol commented 2 years ago

I have a test setup now, the fix does work. That is, es.attributeremove() does now remove the attribute. It logs the events of doing so etc, and seems to continue just fine.

dlacoste-esrf commented 2 years ago

26 merged, resolving this issue. Thanks for bringing it up and fixing it!