m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
434 stars 201 forks source link

Async RPC ConnectionResetError #715

Closed cjbe closed 7 years ago

cjbe commented 7 years ago

The following experiment errors-out with a ConnectionResetError:

class TestAsyncRpc(EnvExperiment):
    def build(self):
        self.setattr_device("core")

    @rpc(flags={"async"})
    def rpc(self, vec):
        pass

    @kernel
    def run(self):
        self.rpc([0]*1000)

Error message:

root:Terminating with exception (ConnectionResetError: Connection closed)
Traceback (most recent call last):
  File "/home/artiq/anaconda3/envs/artiq3/lib/python3.5/site-packages/artiq/master/worker_impl.py", line 270, in main
    exp_inst.run()
  File "/home/artiq/anaconda3/envs/artiq3/lib/python3.5/site-packages/artiq/language/core.py", line 53, in run_on_core
    return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs)
  File "/home/artiq/anaconda3/envs/artiq3/lib/python3.5/site-packages/artiq/coredevice/core.py", line 124, in run
    self.comm.serve(embedding_map, symbolizer, demangler)
  File "/home/artiq/anaconda3/envs/artiq3/lib/python3.5/site-packages/artiq/coredevice/comm_kernel.py", line 560, in serve
    self._read_header()
  File "/home/artiq/anaconda3/envs/artiq3/lib/python3.5/site-packages/artiq/coredevice/comm_kernel.py", line 153, in _read_header
    (sync_byte, ) = struct.unpack("B", self.read(1))
  File "/home/artiq/anaconda3/envs/artiq3/lib/python3.5/site-packages/artiq/coredevice/comm_kernel.py", line 136, in read
    raise ConnectionResetError("Connection closed")
ConnectionResetError: Connection closed
whitequark commented 7 years ago

What is in the core log?

cjbe commented 7 years ago
[   202951541us]  WARN(runtime): network error: truncated packet
[   208000937us]  INFO(runtime::session): new connection from 10.255.6.32:45184
[   208057425us]  INFO(kernel): panic at /var/lib/buildbot/slaves/debian-stretch-amd64-2/miniconda/conda-bld/artiq-kc705-nist_clock_1491663291253/work/artiq/firmware/liballoc_none/lib.rs:7: not yet implemented
[   208075462us] ERROR(runtime::session): unexpected request RunAborted from kernel CPU
[   208082157us] ERROR(runtime::session): session aborted: protocol error
[   208088838us]  INFO(runtime::session): no connection, starting idle kernel
[   208095973us]  INFO(runtime::session): no idle kernel found
[   211013891us]  WARN(runtime): network error: truncated packet
[   211601790us]  INFO(runtime::mgmt): new connection from 10.255.6.32:48174
whitequark commented 7 years ago

Ah, I've been looking for a repro for that intermittent failure for quite some time.