Open skrlin opened 2 years ago
PTAL @jaypume @XinYao1994
@skrlin @JoeyHwong-gk
It is very hard to understand why your image is produced 2 months ago
. Did you make sure that you have successfully updated the image?
@skrlin @JoeyHwong-gk It is very hard to understand why your image is produced
2 months ago
. Did you make sure that you have successfully updated the image?
I think the reason is that @skrlin used v0.4.0
which has a bug. And I suggest you can try the latest version(i.e. v0.4.3
).
@XinYao1994 can you help to update the version of federated learning example yaml?
@JoeyHwong-gk I didn’t update the mirror, just pulled the v0.4.0 version of the mirror in the depository according to the tutorial
@llhuii @skrlin @jaypume We have planned to add a tutorial demo recently. Hope that can help. :) Federated learning example yaml will be updated before we release that demo.
@llhuii OK, thank you very much for your answer
@XinYao1994 OK, thank you very much for your answer
@skrlin @JoeyHwong-gk It is very hard to understand why your image is produced
2 months ago
. Did you make sure that you have successfully updated the image?I think the reason is that @skrlin used
v0.4.0
which has a bug. And I suggest you can try the latest version(i.e.v0.4.3
).
我也遇到了这个问题,用的是v0.4.3,log如下:
[INFO][02:29:18]: New cache created: data/COCO/coco128/labels/train2017.cache
[INFO][02:29:18]: No clients are launched (server:disable_clients = true)
[INFO][02:29:18]: Starting a server at address 0.0.0.0 and port 7363.
[INFO][02:29:32]: 192.168.0.71 [23/Nov/2021:02:29:32 +0000] "GET /socket.io/?transport=polling&EIO=4&t=1637634572.613998 HTTP/1.1" 200 292 "-" "Python/3.6 aiohttp/3.8.0"
[INFO][02:29:32]: 192.168.0.71 [23/Nov/2021:02:29:32 +0000] "GET /socket.io/?transport=polling&EIO=4&t=1637634572.612923 HTTP/1.1" 200 292 "-" "Python/3.6 aiohttp/3.8.0"
[INFO][02:29:32]: [Server #6] A new client just connected.
[INFO][02:29:32]: [Server #6] A new client just connected.
[INFO][02:29:32]: [Server #6] New client with id #2 arrived.
[INFO][02:29:32]: [Server #6] Starting training.
[INFO][02:29:32]:
[Server #6] Starting round 1/1.
[INFO][02:29:32]: [Server #6] Selecting client #2 for training.
[INFO][02:29:32]: [Server #6] Sending the current model to client #2.
[INFO][02:29:32]: [Server #6] New client with id #1 arrived.
[INFO][02:29:37]: [Server #6] Sent 27.96 MB of payload data to client #2.
[INFO][02:31:31]: [Server #6] Received 400.11 MB of payload data from client #2.
[INFO][02:31:31]: [Server #6] All 1 client reports received. Processing.
[ERROR][02:31:31]: Task exception was never retrieved
future: <Task finished coro=<AsyncServer._handle_event_internal() done, defined at /usr/local/lib/python3.6/site-packages/socketio/asyncio_server.py:502> exception=AttributeError("'list' object has no attribute 'num_train_examples'",)>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/socketio/asyncio_server.py", line 504, in _handle_event_internal
r = await server._trigger_event(data[0], namespace, sid, *data[1:])
File "/usr/local/lib/python3.6/site-packages/socketio/asyncio_server.py", line 547, in _trigger_event
event, *args)
File "/usr/local/lib/python3.6/site-packages/socketio/asyncio_namespace.py", line 37, in trigger_event
ret = await handler(*args)
File "/home/plato/plato/servers/base.py", line 59, in on_client_payload_done
data['obkey'])
File "/home/plato/plato/servers/base.py", line 446, in client_payload_done
await self.process_reports()
File "/home/plato/plato/servers/mistnet.py", line 40, in process_reports
sampler = all_inclusive.Sampler(feature_dataset)
File "/home/plato/plato/samplers/all_inclusive.py", line 18, in __init__
self.all_inclusive = range(dataset.num_train_examples())
AttributeError: 'list' object has no attribute 'num_train_examples'
@XinYao1994 帮忙看一下
@skrlin @JoeyHwong-gk It is very hard to understand why your image is produced
2 months ago
. Did you make sure that you have successfully updated the image?I think the reason is that @skrlin used
v0.4.0
which has a bug. And I suggest you can try the latest version(i.e.v0.4.3
).我也遇到了这个问题,用的是v0.4.3,log如下:
[INFO][02:29:18]: New cache created: data/COCO/coco128/labels/train2017.cache [INFO][02:29:18]: No clients are launched (server:disable_clients = true) [INFO][02:29:18]: Starting a server at address 0.0.0.0 and port 7363. [INFO][02:29:32]: 192.168.0.71 [23/Nov/2021:02:29:32 +0000] "GET /socket.io/?transport=polling&EIO=4&t=1637634572.613998 HTTP/1.1" 200 292 "-" "Python/3.6 aiohttp/3.8.0" [INFO][02:29:32]: 192.168.0.71 [23/Nov/2021:02:29:32 +0000] "GET /socket.io/?transport=polling&EIO=4&t=1637634572.612923 HTTP/1.1" 200 292 "-" "Python/3.6 aiohttp/3.8.0" [INFO][02:29:32]: [Server #6] A new client just connected. [INFO][02:29:32]: [Server #6] A new client just connected. [INFO][02:29:32]: [Server #6] New client with id #2 arrived. [INFO][02:29:32]: [Server #6] Starting training. [INFO][02:29:32]: [Server #6] Starting round 1/1. [INFO][02:29:32]: [Server #6] Selecting client #2 for training. [INFO][02:29:32]: [Server #6] Sending the current model to client #2. [INFO][02:29:32]: [Server #6] New client with id #1 arrived. [INFO][02:29:37]: [Server #6] Sent 27.96 MB of payload data to client #2. [INFO][02:31:31]: [Server #6] Received 400.11 MB of payload data from client #2. [INFO][02:31:31]: [Server #6] All 1 client reports received. Processing. [ERROR][02:31:31]: Task exception was never retrieved future: <Task finished coro=<AsyncServer._handle_event_internal() done, defined at /usr/local/lib/python3.6/site-packages/socketio/asyncio_server.py:502> exception=AttributeError("'list' object has no attribute 'num_train_examples'",)> Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/socketio/asyncio_server.py", line 504, in _handle_event_internal r = await server._trigger_event(data[0], namespace, sid, *data[1:]) File "/usr/local/lib/python3.6/site-packages/socketio/asyncio_server.py", line 547, in _trigger_event event, *args) File "/usr/local/lib/python3.6/site-packages/socketio/asyncio_namespace.py", line 37, in trigger_event ret = await handler(*args) File "/home/plato/plato/servers/base.py", line 59, in on_client_payload_done data['obkey']) File "/home/plato/plato/servers/base.py", line 446, in client_payload_done await self.process_reports() File "/home/plato/plato/servers/mistnet.py", line 40, in process_reports sampler = all_inclusive.Sampler(feature_dataset) File "/home/plato/plato/samplers/all_inclusive.py", line 18, in __init__ self.all_inclusive = range(dataset.num_train_examples()) AttributeError: 'list' object has no attribute 'num_train_examples'
@Poorunga @llhuii Please make sure you have used the most updated version because it has been fixed at here
What happened: When I follow https://github.com/kubeedge/sedna/tree/main/examples/federated_learning/yolov5_coco128_mistnet, an error occurred while mistnet was deploying federated learning samples
I have created
/model
and/pretrained
directories at the locations specified in each node according to the tutorial I find the error code in/home/plato/plato/config.py:123
I don't know why this happens. Do I need to change it to the correct path of the pre training model and repackage the image?
The docker images information :
Environment:
Sedna Version
```console $ kubectl get -n sedna deploy gm -o jsonpath='{.spec.template.spec.containers[0].image}' # kubeedge/sedna-gm:v0.4.3 $ kubectl get -n sedna ds lc -o jsonpath='{.spec.template.spec.containers[0].image}' # kubeedge/sedna-lc:v0.4.3 ```Kubernets Version
```console $ kubectl version # Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:59:43Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.12", GitCommit:"4bf2e32bb2b9fdeea19ff7cdc1fb51fb295ec407", GitTreeState:"clean", BuildDate:"2021-10-27T17:07:18Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"} ```KubeEdge Version
```console $ cloudcore --version # KubeEdge v1.8.2 $ edgecore --version # KubeEdge v1.8.2 ```CloudSide Environment:
Hardware configuration
```console $ lscpu # 架构: x86_64 CPU 运行模式: 32-bit, 64-bit 字节序: Little Endian CPU: 24 在线 CPU 列表: 0-23 每个核的线程数: 2 每个座的核数: 6 座: 2 NUMA 节点: 2 厂商 ID: GenuineIntel CPU 系列: 6 型号: 45 型号名称: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz 步进: 7 CPU MHz: 2299.795 CPU 最大 MHz: 2500.0000 CPU 最小 MHz: 1200.0000 BogoMIPS: 3999.64 虚拟化: VT-x L1d 缓存: 32K L1i 缓存: 32K L2 缓存: 256K L3 缓存: 15360K NUMA 节点0 CPU: 0-5,12-17 NUMA 节点1 CPU: 6-11,18-23 ```OS
```console $ cat /etc/os-release # NAME="Ubuntu" VERSION="18.04.6 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.6 LTS" VERSION_ID="18.04" ```Kernel
```console $ uname -a # Linux node01 5.4.0-84-generic #94~18.04.1-Ubuntu SMP Thu Aug 26 23:17:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ```