secretflow / spu

SPU (Secure Processing Unit) aims to be a provable, measurable secure computation device, which provides computation ability while keeping your private data protected.
https://www.secretflow.org.cn/docs/spu/en/
Apache License 2.0
243 stars 106 forks source link

[Bug]: ppd.init出错,报告显示KeyError: 'processNameCorrected' #570

Closed yangjd7 closed 9 months ago

yangjd7 commented 9 months ago

Issue Type

Others

Modules Involved

SPU compiler

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

spu0.7.0

OS Platform and Distribution

ubuntu22.04

Python Version

3.8

Compiler Version

No response

Current Behavior?

应该能够初始化ppd

Standalone code to reproduce the issue

import spu.utils.distributed as ppd

# initialized the distributed environment.
ppd.init(ppd.SAMPLE_NODES_DEF, ppd.SAMPLE_DEVICES_DEF)

Relevant log output

(sf) crying@USER-20220419CQ:/mnt/d/ppml$ python -m spu.utils.distributed up
[2024-02-22 16:28:00,094] [Process-1] Starting grpc server at 127.0.0.1:61327
[2024-02-22 16:28:00,096] [Process-2] Starting grpc server at 127.0.0.1:61328
[2024-02-22 16:28:00,096] [Process-3] Starting grpc server at 127.0.0.1:61329
^CProcess Process-1:
Process Process-3:
Process Process-2:
Traceback (most recent call last):
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/spu/utils/distributed.py", line 1329, in <module>
    worker.join()
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/spu/utils/distributed.py", line 209, in serve
    server.wait_for_termination()
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/spu/utils/distributed.py", line 209, in serve
    server.wait_for_termination()
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_server.py", line 1118, in wait_for_termination
    return _common.wait(self._state.termination_event.wait,
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_server.py", line 1118, in wait_for_termination
    return _common.wait(self._state.termination_event.wait,
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_common.py", line 150, in wait
    _wait_once(wait_fn, MAXIMUM_WAIT_TIMEOUT, spin_cb)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_common.py", line 150, in wait
    _wait_once(wait_fn, MAXIMUM_WAIT_TIMEOUT, spin_cb)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_common.py", line 112, in _wait_once
    wait_fn(timeout=timeout)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_common.py", line 112, in _wait_once
    wait_fn(timeout=timeout)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/threading.py", line 306, in wait
    gotit = waiter.acquire(True, timeout)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/threading.py", line 306, in wait
    gotit = waiter.acquire(True, timeout)
KeyboardInterrupt
KeyboardInterrupt
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/spu/utils/distributed.py", line 209, in serve
    server.wait_for_termination()
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_server.py", line 1118, in wait_for_termination
    return _common.wait(self._state.termination_event.wait,
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_common.py", line 150, in wait
    _wait_once(wait_fn, MAXIMUM_WAIT_TIMEOUT, spin_cb)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/site-packages/grpc/_common.py", line 112, in _wait_once
    wait_fn(timeout=timeout)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/home/crying/anaconda3/envs/sf/lib/python3.8/threading.py", line 306, in wait
    gotit = waiter.acquire(True, timeout)
KeyboardInterrupt
Chrisdehe commented 9 months ago

@tpppppub 已复现到这个问题,正在下一步确认详情

tpppppub commented 9 months ago

这里日志打印有问题,不影响功能,可以先继续运行,会在后续版本修复。