snoplus / orca

Git repository tracking the main Orca svn for SNO+ development
2 stars 13 forks source link

Force notifications which touch GUI elements onto main thread #504

Closed EdLeming closed 6 years ago

EdLeming commented 6 years ago

This PR is an attempt fix the CRUS crash observed recently at Sussex. Debugging by Martti suggests the issue was line 978 of the TUBii model. This should be resolved by posting the note which calls this method on the main thread.

It may be that #499 was also due to the *ellieFireAction methods in the ELLIEController not posting to the main thread. These notes are picked up in the SNOPController and the resulting method calls make minor GUI changes (activate / deactivate buttons). This PR also addresses those cases.

tlatorre-uchicago commented 6 years ago

One thing that worries me about the recent issues with the keep alive at remote institutions is that it appears to be happening primarily at run starts. The issue is that a modal dialog will block the main thread. If this happens during the run start procedure, it's possible that the detector will not have triggers enabled when this modal pops up. If that's the case we won't be live until the shifter notices and clicks the button which could be ~30 sec to 1 minute.

Is the main goal of the keep alive signal to prevent the ellie systems from firing when Orca is not running? If that's the case, can it just automatically be restarted or is there some case in which you don't want it to be restarted?

EdLeming commented 6 years ago

Yes, that is the main goal.

I have removed the pop-up and replaced with auto-restarts at the beginning of ELLIE flash sequences. I also added a red warning message so users are aware they can go attempt a restart from the ELLIE gui if they'd like to debug an issue.

EdLeming commented 6 years ago

I'll test on the teststand in the morning

tlatorre-uchicago commented 6 years ago

How did the testing go with this? Is it ready to go?

EdLeming commented 6 years ago

Hey. I tested yesterday. It didn't solve the TUBii, orphan issue, but ran smoothly that aside. I am confident it solved the issue it was originally posted for (i.e. the run modal crashed).

EdLeming commented 6 years ago

I should add, while testing on the detector I also stepped through commenting out sections of the code associated with the culprit commit (be17c8e019c328db8c82c96351291aa8bc414695). I commented out all additions in the SNOPModel, SNOPController, TUBiiModel and TUBii related changes to the ELLIEController.m before handing over to afternoon crew for ramping. Even with all of those changes removed I was still seeing TUBii orphans at run restarts.

It's not clear at all what's happening here. All I can imagine is that some process has been added which affects the ordering of calls at run restart. Changing the TUBii client timeout to 2s and extending the time between ORCA->TUBii keep alive pulses from (0.5s to 5.0s) also had no effect.

EdLeming commented 6 years ago

Thanks Tony. Those notifications which I've updated to be on the main thread call a void function, not an IBAction. It was separated from the IBAction in the SNOPController as they two need to load their TELLIE specific settings from different places and the logic required in a single function was quite messy. I kept the void function (which picks up the notification) in the SNOPController for a couple of reasons:

1) So starting a TELLIE sequence from either GUI will always activate stop buttons in the control panel - making the actions a user can perform to stop a flash sequence consistent no matter which GUI it was started from. (Currently it's only the ELLIE gui which will ever get used, but if the system stabalises this will no longer be the case)

2) So I could start a little disco ball spinning in the control panel at the start of a flash sequence. This feature actually got removed due to a clash with one of Javi's PRs and I never put it back. I will when I get time.

tlatorre-uchicago commented 6 years ago

I'm not sure I understand. What do you mean that they call a void function? Are the three methods tellieGeneralFireAction, tellieExpertFireAction, and amellieFireAction called from methods in separate threads?

EdLeming commented 6 years ago

The "ORTellieRunStartNotification" gets picked up, here:

https://github.com/snoplus/orca/blob/master/Source/Experiments/SNOP/SNOPController.m#L353

Which calls:

https://github.com/snoplus/orca/blob/master/Source/Experiments/SNOP/SNOPController.m#L1789

tlatorre-uchicago commented 6 years ago

Ok, but since these three methods (tellieGeneralFireAction, tellieExpertFireAction, and amellieFireAction) are all IBAction's and are only called when the user hits a button they are guaranteed to be on the main thread, so when you post the notification it's also guaranteed to be on the main thread, so you don't need to explicitly change the notifications to only post on the main thread.

EdLeming commented 6 years ago

Ahh, OK. I see what you're saying. I mis-understood your initial comment. I thought you were referring to the functions they were calling, not where they were called from.

I've updated.

tlatorre-uchicago commented 6 years ago

merged.