tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.72k stars 1.66k forks source link

Refresh Backend when frontend refreshes #80

Open jart opened 7 years ago

jart commented 7 years ago

@s-gv said in https://github.com/tensorflow/tensorflow/issues/2050 a year ago with 👍x6:

Tensorboard seems to scan the log infrequently. Right now, I have to kill and re-start the server to make it re-scan the log. Could we have tensorboard re-scan the log when the tensorboard webpage is refreshed?

chihuahua commented 7 years ago

We now attempt to reload the backend 5s after the previous reload finishes: https://github.com/tensorflow/tensorboard/blob/2d7d62a13c30fe59967e583c696aae55f1e823e4/tensorboard/main.py#L71

I don't see how we can do significantly better than that, so I think this can be closed barring other compelling reasons.

jart commented 7 years ago

Chatted offline. I think we can actually have real time synchronization between the backend and frontend using long polling. Better yet, we can have a websocket open with the backend, and then we can show actual progress bars as the thing loads event files. That would be sweet.

bw4sz commented 7 years ago

I'm trying to gauge what we can expect from tensorboard speed. I'm finding that the reading in of data is incredibly slow, i'm running tensorflow in google cloud machine learning engine, and then just pointing to the logdir from within the google cloud shell. I'd say that it takes 10 minutes to load the event data. The model has already run. Is this the same issue, should I be manually refreshing to force the frontend to replot?

jart commented 7 years ago

How big are your event logs and are they stored on GCS?

bw4sz commented 7 years ago

image

I have identified that the problem is specifically with running from google cloud shell. If I download the whole dir, and visualize locally, it happens in seconds. It was natural to use tensorboard within that environment and not want to grab the whole model dir (200mb) and copy locally.

teamdandelion commented 7 years ago

I am guessing we are running into an issue where on GCS we are loading the events without readahead chunks, so we are doing round trip requests incredibly frequently, maybe as often as for every individual event. This makes for horrible performance. Filing a new issue for this here: https://github.com/tensorflow/tensorboard/issues/158 (since this thread is more about streaming / synchronization)

geyang commented 7 years ago

It would be ideal if the graph window could give instantaneous refresh.

For pyTorch I built a tool that outputs the torch graph into a PDF. I use the skim PDF viewer to watch the pdf. This way I can insert a graph(variable_a) command for whichever partial graph I want to take a close look, and the feedback is instantaneous (less than 200 ms).

When I first saw tensorboard I thought this was the key feature. There must be a lot of people who have the same expectation.

The magic occurs when it is so fast that you can use this to interactively program your model.

Technically, the challenge is to build the event handling such that it is both

  1. low latency and
  2. throttle large number of events.

can we do this via something like @throttle(rising=true, delay=3000)?