redmod-team / profit

Probabilistic Response mOdel Fitting with Interactive Tools
https://profit.readthedocs.io
MIT License
14 stars 9 forks source link

ZeroMQ Interface: `ConnectionError` #149

Closed Rykath closed 2 years ago

Rykath commented 2 years ago

After @mkendler previously reported this problem, I reproduced this bug on acluster (using an allocated node with local Runner, proFit 0.5.dev22+g119ccff for python 3.9.2). Using interface: zeromq results in a ConnectionError:

Traceback (most recent call last):
  File "/home/oswell/venv/bin/profit", line 33, in <module>
    sys.exit(load_entry_point('profit', 'console_scripts', 'profit')())
  File "/home/oswell/profit/profit/main.py", line 88, in main
    runner.spawn_array(tqdm(params_array), blocking=True)
  File "/home/oswell/profit/profit/run/default.py", line 63, in spawn_array
    self.spawn_run(params)
  File "/home/oswell/profit/profit/run/default.py", line 48, in spawn_run
    worker = Worker.from_config(self.run_config, self.next_run_id)
  File "/home/oswell/profit/profit/run/worker.py", line 185, in from_config
    return cls[config['worker']](config, interface, pre, post, run_id)
  File "/home/oswell/profit/profit/run/worker.py", line 178, in __init__
    self.interface: Interface = interface_class(config['interface'], run_id, logger_parent=self.logger)
  File "/home/oswell/profit/profit/run/zeromq.py", line 92, in __init__
    self.request('READY')  # self.input, self.output
  File "/home/oswell/profit/profit/run/zeromq.py", line 159, in request
    raise ConnectionError('could not connect to RunnerInterface')
ConnectionError: could not connect to RunnerInterface

log/run_000.log:

2021-11-07 15:48:44,147 INFO     Interface: connected to tcp://localhost:9100
2021-11-07 15:48:46,651 WARNING  Interface: READY: no response
2021-11-07 15:48:47,652 INFO     Interface: connected to tcp://localhost:9100
2021-11-07 15:48:50,155 WARNING  Interface: READY: no response
2021-11-07 15:48:51,156 INFO     Interface: connected to tcp://localhost:9100
2021-11-07 15:48:53,659 WARNING  Interface: READY: no response
2021-11-07 15:48:54,660 INFO     Interface: connected to tcp://localhost:9100
2021-11-07 15:48:57,163 WARNING  Interface: READY: no response
2021-11-07 15:48:58,164 ERROR    Interface: READY: 4 requests unsuccessful, abandoning
Rykath commented 2 years ago

Using the ZeroMQ Interface with the Slurm Runner works just fine as well. I therefore don’t think it is a problem with the ZeroMQ Interface but rather with the forked Workers.