Open JasonNing96 opened 3 years ago
by the way I'm change the version of V 0.3.0 because my docker images build V0.4.0:
kubectl create -f - <<EOF apiVersion: sedna.io/v1alpha1 kind: FederatedLearningJob metadata: name: surface-defect-detection spec: aggregationWorker: model: name: "surface-defect-detection-model" template: spec: nodeName: $CLOUD_NODE containers:
@JoeyHwong-gk
@JasonNing96 try newer version: v0.4.2
I followed by the online installe page, it should be the lastest version, right ? Or Install local I will try
I means try example version v0.4.2
.
I just tried kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.4.0
is OK, but the image kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.4.2@sha256:47fd842ce9947
reported the following error:
[INFO][08:27:05]: Client: simple
[INFO][08:27:05]: Trainer: basic
[INFO][08:27:05]: Algorithm: fedavg
Traceback (most recent call last):
File "train.py", line 60, in <module>
main()
File "train.py", line 57, in main
fl_model.run()
AttributeError: 'FederatedLearningV2' object has no attribute 'run'
by the way I'm change the version of V 0.3.0 because my docker images build V0.4.0:
I think you don't need to build the example image by youself.
I means try example version
v0.4.2
.I just tried
kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.4.0
is OK, but the imagekubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.4.2@sha256:47fd842ce9947
reported the following error:[INFO][08:27:05]: Client: simple [INFO][08:27:05]: Trainer: basic [INFO][08:27:05]: Algorithm: fedavg Traceback (most recent call last): File "train.py", line 60, in <module> main() File "train.py", line 57, in main fl_model.run() AttributeError: 'FederatedLearningV2' object has no attribute 'run'
@jaypume @XinYao1994 please take a look
Maybe fl_model.train()
should be used here instead of fl_model.run()
, and we will fix it ASAP.
1) I have a question about the dataset deploye, It's run commend on Cloud? 2) My surface-defect-detection-train- is keeping restart and error between edge1 and edge 2. When logs the pod it shown : And docker logs shown: Other pod was working, but the tarin-work down. And the server seen running: