tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.68k stars 1.65k forks source link

Tensorboard could not bind to unsupported address family #6416

Closed komalkotha closed 1 year ago

komalkotha commented 1 year ago

Tensorboard is installed with charmed Operator with JUJU ( tensorboard-controller1.6/stable ).

When Pod is launched below is the error message.

[40118465865536 program.py:288] Tensorboard could not bind to unsupported address family :: ERROR: Tensorboard could not bind to unsupported address family ::](url)

And below is Deployment created with tensorboard-controller with 1.6/stable version in kubeflow .


apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "5"
  creationTimestamp: "2023-05-30T07:53:29Z"
  generation: 15
  name: testtst
  namespace: komal-kotha
  ownerReferences:
  - apiVersion: tensorboard.kubeflow.org/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Tensorboard
    name: testtst
    uid: b1cf07f7-bc8a-411a-a0ad-665a984f0e6f
  resourceVersion: "401804297"
  uid: 281c32c9-3fc9-4ce6-ab91-50619421dec1
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: testtst
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: testtst
    spec:
      affinity: {}
      containers:
      - args:
        - --logdir=/tensorboard_logs/
        - --bind_all
        command:
        - /usr/local/bin/tensorboard
        env:
        - name: TF_CPP_MIN_LOG_LEVEL
          value: "2"
        image: tensorflow/tensorflow:2.1.0
        imagePullPolicy: IfNotPresent
        name: tensorboard
        ports:
        - containerPort: 6006
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /tensorboard_logs/
          name: tbpd
          readOnly: true
          subPath: test
        workingDir: /
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: tbpd
        persistentVolumeClaim:
          claimName: test-new-notebook-volume
status:
  conditions:
  - lastTransitionTime: "2023-05-30T07:53:29Z"
    lastUpdateTime: "2023-06-05T11:49:10Z"
    message: ReplicaSet "testtst-6b6f5659c5" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-06-05T12:52:45Z"
    lastUpdateTime: "2023-06-05T12:52:45Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  observedGeneration: 15
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1
groszewn commented 1 year ago

Hi there. The TensorBoard team is not a maintainer of the TensorBoard operator you mentioned, but this looks related to #1713. It looks like the operator is pulling in an older TensorFlow image (2.1.0 which came out in January 2020, we're currently at 2.13.0). Could you try using a newer version and let us know how that goes? Otherwise, you may want to try updating the controller to explicitly specify --host=0.0.0.0 instead of --bind_all to serve on all IPv4 addresses.

komalkotha commented 1 year ago

Hi @groszewn thanks for quick response . i have update to --host=0.0.0.0 in tensorboard deployment it worked but in tensor board controller code was hardcoded with --bind_all ,

groszewn commented 1 year ago

We do not own the controller, so you'll need to work with the controller maintainers to have them update to use --host=0.0.0.0 instead of --bind_all as this is the current workaround.

This is a duplicate of #1713. I'll close this issue, but feel free to follow up on #1713.