jlewi / flaap

Federated Learning and Analytics Protocols
Apache License 2.0
0 stars 0 forks source link

RPC from the worker to update the task is failing; task already exists. #17

Closed jlewi closed 2 years ago

jlewi commented 2 years ago
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/storage/jupyter/git_flaap/py/flaap/tff/task_handler.py", line 173, in <module>
    fire.Fire(Runner)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/storage/jupyter/git_flaap/py/flaap/tff/task_handler.py", line 163, in run
    asyncio.run(handler.run())
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/storage/jupyter/git_flaap/py/flaap/tff/task_handler.py", line 115, in run
    await self._poll_and_handle_task()
  File "/storage/jupyter/git_flaap/py/flaap/tff/task_handler.py", line 108, in _poll_and_handle_task
    response = _run_rpc(self._tasks_stub.Update, update_request)
  File "/opt/conda/lib/python3.10/site-packages/tenacity/__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "/opt/conda/lib/python3.10/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/conda/lib/python3.10/site-packages/tenacity/__init__.py", line 349, in iter
    return fut.result()
  File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/opt/conda/lib/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/opt/conda/lib/python3.10/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 228, in sync_trace
    result = fn(*fn_args, **fn_kwargs)
  File "/storage/jupyter/git_flaap/py/flaap/tff/task_handler.py", line 147, in _run_rpc
    return rpc_func(request)
  File "/opt/conda/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/conda/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.ALREADY_EXISTS
        details = "Task with name 2dc0bc91ca5f4fff8516e193c43037ea; already exists; use Update to make changes"
        debug_error_string = "{"created":"@1664333804.064139068","description":"Error received from peer ipv4:127.0.0.1:8081","file":"src/core/lib/
surface/call.cc","file_line":952,"grpc_message":"Task with name 2dc0bc91ca5f4fff8516e193c43037ea; already exists; use Update to make changes","grpc
_status":6}"
>
jlewi commented 2 years ago

Appears to be a bug in the server. https://github.com/jlewi/flaap/blob/331bafb2b48a0ae3f636abc8d85ab22ffa05ff09/go/pkg/tasks/server.go#L41

jlewi commented 2 years ago

Fixed by commit above.