Closed yaqlee closed 1 year ago
in this line, the num of node was set to 1, but if I want to train on multi machines, the num of nodes should be as same as the number of machines. how does it work?
Hi @yaqlee,
We do not use ray within nuPlan framework to manage multi-node training. The code is there because we have attempted to use ray in the past. However, internally we have pivoted to a custom solution that we cannot share with you publicly. Therefore, we cannot help you with this issue unfortunately.
I tried to train model on multi instances wit Ray Distributed. I felt confused about the usage of nuplan/planning/script/config/common/worker/ray_distributed.yaml, should I set the master_node-ip? To the point, what's the difference between the first and second case in initialize_ray in nuplan/planning/utils/multithreading/worker_ray.py?as shown in master node ip case