Open gaocegege opened 5 years ago
Issue-Label Bot is automatically applying the label improvement/enhancement
to this issue, with a confidence of 0.70. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
/area engprod /priority p2
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/reopen We should take this to improve cluster performance.
@tenzen-y: Reopened this issue.
I realized this need by Aldo's comment.
cc: @kubeflow/wg-training-leads
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen
@tenzen-y brought this up in brainstorming around jobset/kubeflow.
We have implemented a few ways to customize network names.
type Network struct {
// EnableDNSHostnames allows pods to be reached via their hostnames.
// Pods will be reachable using the fully qualified pod hostname:
// <jobSet.name>-<spec.replicatedJob.name>-<job-index>-<pod-index>.<subdomain>
// +optional
EnableDNSHostnames *bool `json:"enableDNSHostnames,omitempty"`
// Subdomain is an explicit choice for a network subdomain name
// When set, any replicated job in the set is added to this network.
// Defaults to <jobSet.name> if not set.
// +optional
Subdomain string `json:"subdomain,omitempty"`
}
Was what we used to control service creation for the jobset.
The suffix will differ from .svc.cluster.local
according to the cluster settings. Maybe we could use a CLI parameter to config it.
We have ps/worker/chief for one TFJob. And now we create one headless service for one replica. I think we can use one headless service for easy-to-use.
After that, we could use
{tfjob_name}-{replica_type}-{index}.{service_name}.svc.cluster.local
in the code.WDYT @johnugeorge @richardsliu