microscope-cockpit / cockpit

Cockpit is a microscope graphical user interface. It is a flexible and easy to extend platform aimed at life scientists using bespoke microscopes.
https://microscope-cockpit.org
GNU General Public License v3.0
37 stars 27 forks source link

Red Pitaya executor timeout in experiments. #883

Closed iandobbie closed 1 year ago

iandobbie commented 1 year ago

Any experiment that takes longer than the pyro timeout defined in devices/executorDevices.py initialize(), current main branch has this as 6 seconds, will truncate the experiment with a pyro timeout error.

The error I get is below. I think that cockpit is expecting a return from the executor once the experiment is setup and started, but nothing is being returned.

Exception in thread Experiment-execute:
Traceback (most recent call last):
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\Pyro4\socketutil.py", line 171, in receiveData
    chunk = sock.recv(min(60000, size - msglen))
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\idobbie1\src\cockpit\cockpit\experiment\experiment.py", line 397, in execute
    events.executeAndWaitFor(events.EXPERIMENT_EXECUTION,
  File "C:\Users\idobbie1\src\cockpit\cockpit\events.py", line 309, in executeAndWaitFor
    return executeAndWaitForOrTimeout(eventType, func, None, *args, **kwargs)
  File "C:\Users\idobbie1\src\cockpit\cockpit\events.py", line 343, in executeAndWaitForOrTimeout
    func(*args, **kwargs)
  File "C:\Users\idobbie1\src\cockpit\cockpit\handlers\executor.py", line 209, in executeTable
    return self.callbacks['executeTable'](actions, 0, len(actions), numReps,
  File "C:\Users\idobbie1\src\cockpit\cockpit\devices\executorDevices.py", line 205, in executeTable
    events.executeAndWaitFor(events.EXECUTOR_DONE % self.name, self.connection.RunActions)
  File "C:\Users\idobbie1\src\cockpit\cockpit\events.py", line 309, in executeAndWaitFor
    return executeAndWaitForOrTimeout(eventType, func, None, *args, **kwargs)
  File "C:\Users\idobbie1\src\cockpit\cockpit\events.py", line 343, in executeAndWaitForOrTimeout
    func(*args, **kwargs)
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\Pyro4\core.py", line 185, in __call__
    return self.__send(self.__name, args, kwargs)
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\Pyro4\core.py", line 453, in _pyroInvoke
    msg = message.Message.recv(self._pyroConnection, [message.MSG_RESULT], hmac_key=self._pyroHmacKey)
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\Pyro4\message.py", line 168, in recv
    msg = cls.from_header(connection.recv(cls.header_size))
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\Pyro4\socketutil.py", line 463, in recv
    return receiveData(self.sock, size)
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\Pyro4\socketutil.py", line 184, in receiveData
    raise TimeoutError("receiving: timeout")
Pyro4.errors.TimeoutError: receiving: timeout
iandobbie commented 1 year ago

I have been trying to understand why this works with long experiments on the original dsp which appears to go through a similar code path, and yet the Red Pitaya timesout and kills the data collection.

Additionally, no updates to the status bar occur while this process is waiting. Its almost like it ought to be running in another thread but isn't.

iandobbie commented 1 year ago

I now believe this is at its heart a networking issue, but maybe needs some error checking code to fix it.

I rebooted the red pitaya and magically the expriments started to work normally. There is no longer a timeout issue and the status bar is updating as expected. I suspect that I managed to break the existing pryo backwards connection from the red pitaya to cockpit on a PC. The Red Pitaya is on a private network but a briefly shared the main internet link from the PC to allow the raspberry PI to synch its git repository with github, this changed the red Pitaya network address and I think broke things.

Will investigate further but closing this for now as I think this is a total red herring.