googledatalab / datalab

Interactive tools and developer experiences for Big Data on Google Cloud Platform.
Apache License 2.0
975 stars 249 forks source link

hooking tf.estimator to tensorboard debugger on datalab platform #2090

Open OrielResearchCure opened 5 years ago

OrielResearchCure commented 5 years ago

Hello,

I would like to work with tensorboard debugger. I have two datalabs opened:

  1. datalab that launches the tensorboard web application using the below code.
  2. datalab running the tensorflow model using tf.estimator API
args = ['tensorboard', '--logdir=' + logdir, '--port=' + str(port), '--debugger_port=' + str(debuggerPort)]
p = subprocess.Popen(args)
retry = 10
while (retry > 0):
  if datalab.utils.is_http_running_on(port):
    basepath = os.environ.get('DATALAB_ENDPOINT_URL', '')
    print(basepath)
    url = '%s/_proxy/%d/' % (basepath.rstrip('/'), port)
    html = '<p>TensorBoard was started successfully with pid %d. ' % p.pid
    html += 'Click <a href="%s" target="_blank">here</a> to access it.</p>' % url
    IPython.display.display_html(html, raw=True)
    print p.pid
  time.sleep(1)
  retry -= 1

This works great! Tensorboard is being launched.

hooks = []
hooks.append(tf_debug.TensorBoardDebugHook('localhost:6064'))
train_spec = tf.estimator.TrainSpec(train_input_fn, max_steps=1,hooks=hooks)

The Tensorboard graph is being opened on "https://8081-----devshell.appspot.com/_proxy/6006/" and the debugger port is 6064. My question is: what should I use as the hook url parameter?

Thanks, eilalan

p.s. once I have it working, I will publish a "how to" video. I believe that this might be very useful for others who like to have their product running / managed from datalab. Thanks!

yebrahim commented 5 years ago

I'm not sure what you mean by the hook url parameter, seems like a Tensorboard invokation detail. Perhaps @qimingj might have an idea?

OrielResearchCure commented 5 years ago

Thank you for your response. The tensorboard is being directed to port 6006 and the tensorboard-debugger (a new feature) is directed to port 6064. I have attached the tensorboard-debugger. The hook parameter is the way to "tell" the model where to "push" the debugging data. I was thinking to provide the datalab public ip with the 6064 port - but havent tried it yet. I am not sure if i need to set port forwarding for the datalab VM

screen shot 2018-11-14 at 7 19 05 pm

Thanks, eilalan

yebrahim commented 5 years ago

So if I understand correctly, you'd like a process on one Datalab VM to talk to a process on another Datalab VM? I'm afraid this won't be easy. Can you run both on the same Datalab VM?

OrielResearchCure commented 5 years ago

This view was generate on the same datalab vm. I am using two notebooks on the one VM. First notebook as a listener - launches tensorboard and tensorboard debugger that are waiting for input from the second notebook.

The second notebook runs the estimator (the tensorflow model API) that pushes the debug information into the debugger. The estimator debugger API need to be initialized with the address of the the tensorboard debugger view. Ususally it is localhost:6064. In this case, I don't know what address to use for the estimator initialization. Thanks, eilalan

yebrahim commented 5 years ago

That depends on how the estimator pushes the data, but I would guess it should still be localhost:6064, since both processes are running on the VM.