Open gganduu opened 2 years ago
please check if the kubeconfig is mounted/set, the spark version of driver and the executor image it may be related to misload kubeconfig or wrong version of okhttp/kubernetes-client
@gganduu have you fixed this issue?
My az k8s mode configuration is :
When I tried to running this code, an error was caused:
But with the same pytorch yolov5 code, I can successfully train it using az local mode:
There are two nodes of k8s, and one for controller node, the other for a work node. Under the same k8s env, I can train tf-based yolov3 without error.