microsoft / pai

Resource scheduling and cluster management for AI
https://openpai.readthedocs.io
MIT License
2.63k stars 548 forks source link

Duplicate indicator for host in layout.yaml #4541

Open hzy46 opened 4 years ago

hzy46 commented 4 years ago

In layout.yaml, the hostname and nodename may be duplicate, see:

https://github.com/microsoft/pai/blob/master/contrib/kubespray/quick-start/layout.yaml.template#L45-L46

fanyangCS commented 4 years ago

And in the example layout.yaml followed from the step-by-step manual, we say nodename should be the same as hostip, https://github.com/microsoft/pai/blob/master/examples/cluster-configuration/layout.yaml#L60, also here: https://github.com/microsoft/pai/blob/master/docs/pai-management/doc/how-to-configure-layout.md#field-3-machine-list- While actually the code requires that "nodename" is the same as "hostname". And a question is why we expose the field nodename in the layout.yaml in the first place?

abuccts commented 4 years ago

And a question is why we expose the field nodename in the layout.yaml in the first place?

it's the legacy in YARN version, where nodename should be the same as the ip address to avoid dns issues in Hadoop.

see #24 and #25, now we have removed Hadoop and k8s will resolve nodename, the ip could be removed.