About Tensorboard - Githubissues

Nara0731 commented 5 years ago

Hello, I am not sure about how to set a Tensorboard? I have set the environment variable, I use the PyCharm, so I donot know how to modify the --log-dir ?

Nara0731 commented 5 years ago

It shows "Logging to /tmp/openai-2018*". However, I canno find the directory "/tmp".

Nara0731 commented 5 years ago

I have found the directory “/tmp". However, I only find 0.0.monitor.csv log.txt progress.csv Where should I find the file about Tensorboard？

pzhokhov commented 5 years ago

Hi @Nara0731 ! We have recently added this section to the README: https://github.com/openai/baselines/blob/master/README.md#using-baselines-with-tensorboard basically, you need to set env variables: OPENAI_LOGDIR to where you want the tensorboard files to be saved, and OPENAI_LOG_FORMAT to 'stdout,tensorboard' (if you only need output to command line and tensorboard). The tensorboard data should show up in OPENAI_LOGDIR (subfolder tb). You can launch tensorboard via tensorboard --logdir=$OPENAI_LOGDIR From the fact logs are saved to /tmp/openai-2018* location, I suspect that neither of the environment variables are actually set (at least from python interpreter perspective). Could you run

import os; print(os.environ)

in python and paste here the output? If OPENAI_LOGDIR and OPENAI_LOG_FORMAT are not there, you can set them directly from python:

os.environ['OPENAI_LOGDIR'] = ...
os.environ['OPENAI_LOG_FORMAT'] = 'stdout,tensorboard'

(that has to happen before you start training) Hope this helps!

Nara0731 commented 5 years ago

Yeah， I have set export OPENAI_LOG_FORMAT='stdout,log,csv,tensorboard' # formats are comma-separated, but for tensorboard you only really need the last one export OPENAI_LOGDIR=/tmp

Unfortunately, I did not find any relevant file about tensorboard in the "/tmp"

pzhokhov commented 5 years ago

hm... let's solve it one step at a time. Could you run import os; print(os.environ) in python?

Nara0731 commented 5 years ago

Yes it show /usr/bin/python3.6 /home/ubuntu/baselines/baselines/run.py Logging to /tmp/openai-2018-09-21-14-19-48-807530 environ({'PATH': '/home/ubuntu/bin:/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin', 'LC_MEASUREMENT': 'zh_CN.UTF-8', 'XAUTHORITY': '/home/ubuntu/.Xauthority', 'XMODIFIERS': '@im=ibus', 'LC_TELEPHONE': 'zh_CN.UTF-8', 'XDG_DATA_DIRS': '/usr/share/ubuntu:/usr/share/gnome:/usr/local/share:/usr/share:/var/lib/snapd/desktop:/var/lib/snapd/desktop', 'GDMSESSION': 'ubuntu', 'MANDATORY_PATH': '/usr/share/gconf/ubuntu.mandatory.path', 'LC_TIME': 'zh_CN.UTF-8', 'GTK_IM_MODULE': 'ibus', 'DBUS_SESSION_BUS_ADDRESS': 'unix:abstract=/tmp/dbus-mKrFfL3RGO', 'DEFAULTS_PATH': '/usr/share/gconf/ubuntu.default.path', 'XDG_CURRENT_DESKTOP': 'Unity', 'LD_LIBRARY_PATH': '/home/ubuntu/.mujoco/mjpro150/bin:/usr/lib/nvidia-384', 'UPSTART_SESSION': 'unix:abstract=/com/ubuntu/upstart-session/1000/1458', 'QT4_IM_MODULE': 'xim', 'LC_PAPER': 'zh_CN.UTF-8', 'SESSION_MANAGER': 'local/ubuntu-pc:@/tmp/.ICE-unix/1708,unix/ubuntu-pc:/tmp/.ICE-unix/1708', 'QT_LINUX_ACCESSIBILITY_ALWAYS_ON': '1', 'LOGNAME': 'ubuntu', 'JOB': 'unity-settings-daemon', 'PWD': '/home/ubuntu/baselines/baselines', 'IM_CONFIG_PHASE': '1', 'PYCHARM_HOSTED': '1', 'LANGUAGE': 'en_US', 'PYTHONPATH': '/home/ubuntu/baselines', 'SHELL': '/bin/bash', 'LC_ADDRESS': 'zh_CN.UTF-8', 'UNITY_HAS_3D_SUPPORT': 'true', 'GIO_LAUNCHED_DESKTOP_FILE': '/usr/share/applications/jetbrains-pycharm-ce.desktop', 'GTK2_MODULES': 'overlay-scrollbar', 'INSTANCE': '', 'OLDPWD': '/home/ubuntu/package/pycharm-community-2018.1.2/bin', 'GNOME_DESKTOP_SESSION_ID': 'this-is-deprecated', 'UPSTART_INSTANCE': '', 'CLUTTER_IM_MODULE': 'xim', 'XDG_SESSION_PATH': '/org/freedesktop/DisplayManager/Session0', 'COMPIZ_BIN_PATH': '/usr/bin/', 'SESSIONTYPE': 'gnome-session', 'XDG_SESSION_DESKTOP': 'ubuntu', 'SHLVL': '0', 'LC_IDENTIFICATION': 'zh_CN.UTF-8', 'LC_MONETARY': 'zh_CN.UTF-8', 'COMPIZ_CONFIG_PROFILE': 'ubuntu', 'QT_IM_MODULE': 'ibus', 'UPSTART_JOB': 'unity7', 'XDG_CONFIG_DIRS': '/etc/xdg/xdg-ubuntu:/usr/share/upstart/xdg:/etc/xdg', 'LANG': 'en_US.UTF-8', 'GNOME_KEYRING_CONTROL': '', 'XDG_SEAT_PATH': '/org/freedesktop/DisplayManager/Seat0', 'XDG_SESSION_ID': 'c2', 'XDG_SESSION_TYPE': 'x11', 'DISPLAY': ':0', 'UNITY_DEFAULT_PROFILE': 'unity', 'LC_NAME': 'zh_CN.UTF-8', 'GDM_LANG': 'en_US', 'PYTHONIOENCODING': 'UTF-8', 'XDG_GREETER_DATA_DIR': '/var/lib/lightdm-data/ubuntu', 'UPSTART_EVENTS': 'xsession started', 'GPG_AGENT_INFO': '/home/ubuntu/.gnupg/S.gpg-agent:0:1', 'DESKTOP_SESSION': 'ubuntu', 'SESSION': 'ubuntu', 'USER': 'ubuntu', 'XDG_MENU_PREFIX': 'gnome-', 'GIO_LAUNCHED_DESKTOP_FILE_PID': '1996', 'QT_ACCESSIBILITY': '1', 'LC_NUMERIC': 'zh_CN.UTF-8', 'SSH_AUTH_SOCK': '/run/user/1000/keyring/ssh', 'XDG_SEAT': 'seat0', 'PYTHONUNBUFFERED': '1', 'QT_QPA_PLATFORMTHEME': 'appmenu-qt5', 'LD_PRELOAD': '/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-384/libGL.so', 'XDG_VTNR': '7', 'XDG_RUNTIME_DIR': '/run/user/1000', 'HOME': '/home/ubuntu', 'GNOME_KEYRING_PID': ''})

pzhokhov commented 5 years ago

Thanks! Yeah, so basically one way or another the OPENAI_LOGDIR and OPENAI_LOG_FORMAT do not make it to the python process environment variables. The fix is really easy - add

import os
os.environ['OPENAI_LOGDIR']='/tmp'
os.environ['OPENAI_LOG_FORMAT']='stdout,tensorboard'

to the very top of your python script; and try running it again. Ideally, tensorboard checkpoints should show up in /tmp/tb folder. Please let me know if that does not work for you,

Nara0731 commented 5 years ago

I cannot find the "/tmp/tb", I only find "tmp"

pzhokhov commented 5 years ago

okay; could you post here your python code please? Thanks!

smalltingting commented 5 years ago

Hi, I have the same problem as you. I solved the problem like this. Just modify the code for 209th in run.py. if MPI is None or MPI.COMM_WORLD.Get_rank() == 0: rank = 0 logger.configure(dir='./log',format_strs=['stdout','log','csv','tensorboard'])

Nara0731 commented 5 years ago

Really? I will try it.

srivatsankrishnan commented 5 years ago

Hi @pzhokhov @smalltingting I configured the logger setting as you have mentioned. I see it created a directory called "tb". However, it is empty. Any idea what is going on?

I am using deepq example but I think it shouldn't matter. This is how I configure it in my code:

def main():
    logger.configure(dir='.log', format_strs=['stdout', 'log', 'csv', 'tensorboard'])

pzhokhov commented 5 years ago

@srivatsankrishnan does logger print anything on the screen / in the log file? Logger only saves data when a logger.dumpkvs() (or logger.dump_tabular()) is called, which by default happens fairly rarely in deepq. Could you try with --print_freq=1 option?

srivatsankrishnan commented 5 years ago

Hi @pzhokhov, The only thing the logger prints in the screen is this message: "Logging to .log"

It creates the following folder structure in .log: /logs |------tb |------log |------progress

The tb folder is empty. The progress.csv is also empty. The "log" ( the file that gets created inside the directory) file basically has the same message that was printed in the console ("Logging to .log"). I tried changing the --print_freq=1 but the results are the same.

I tried to hack the code where I create my model (models.py) to explicitly export my graph to visualize in TensorBoard. This is what i use:

tf_writer = tf.summary.FileWriter(LOGDIR)
tf_writer.add_graph(tf.get_default_session().graph)

But the graph is too complex and can't trace to my input and output nodes ( Honestly trying to make sense of it and not given up on that yet). I assume the functionality that you guys enable with logger for tensorboard will be more structured or methodical to visualize it in tensorboard.

pzhokhov commented 5 years ago

Hi @srivatsankrishnan ! Sorry about the lag. If all the progress.csv is empty, tb/ subfolder is empty and nothing interesting is printed on the screen, it means that 1) the training did not progress to the point where it would save anything (call logger.dump_tabular()) . or 2) something bad happened to the logger module

Could you try running a simple test with deepq, for instance:

export OPENAI_LOG_FORMAT=stdout,csv,tensorboard
export OPENAI_LOGDIR=.log
python -m baselines.run --alg=deepq --env=CartPole-v0 --print_freq=1 --num_timesteps=1e5

If everything works correctly, this should generate a long output that looks like:

-----------------------------------
| % time spent exploring  | 2     |
| episodes                | 843   |
| mean 100 episode reward | 190.8 |
| steps                   | 99081 |
-----------------------------------

and files progress.csv, 0.0.monitor.csv, log.txt, and subfolder tb in .log. If that works, but your case still does not, it probably means that logger / logger configuration are messed up. If the test above does not work, then something in your python environment is not quite right; and in that case, I'd recommend installing baselines in a clean virtualenv, and trying again.

srivatsankrishnan commented 5 years ago

Hi @pzhokhov! No worries. This one works and I see logs and event file getting generated. When I open tensorboard, it only has the scalars such as (% time spent exploring, episodes, rewards etc). I don't see a graph in tensorboard.I was interested in seeing the graph for the neural net model to determine input and output nodes. I just hacked the code where I define the model to capture the graph. So in a way, I was able to get what I wanted.

As you put it, In my case, I just put 100 steps for my environment and --print_freq=1 to quickly capture the graph. Maybe it didn't get to a point where logger.dump_tabular() wasn't getting called.

On a different note, Is there a plan to support saving the model in native tensorflow format along with graph (.pb)? The reason is that there are lots of interesting tools in tensorflow and they basically require the model in one of these formats.

pzhokhov commented 5 years ago

oh now I see :) Yeah, the logger only saves scalars. As for long-term support of saving entire models in tensorflow / tensorboard support and serialization in general - this has been a subject of quite a bit of debate. We will likely support custom serialization functions (so that every use case can pick its poison), but I don't have a timeline for that. If you could provide an example of useful functionality that is missing by not saving data in tensorflow format, we can speed it up somewhat :)

srivatsankrishnan commented 5 years ago

Hi @pzhokhov, Thanks for your reply. There are lots of tools in tensorflow to fine-tune inference performance: (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/tools) (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md) The basic requirement is to use these tools is to have models saved in native tensorflow format (checkpoints, .pb etc). I am particularly interested in using these tools and was able to hack the code to save it in the native tensorflow format. I am currently facing some tensorflow related issue but will soon be able to test it out once I resolve those.

I have one more useful functionality in mind but its orthogonal to this discussion. Maybe I will open a new issue for it to avoid mixing it up with this.

ryanmaxwell96 commented 4 years ago

Was there any resolution to this issue? I've tried the same suggestions that have been listed so far (os.environ['OPENAI_LOGDIR'] = ... and os.environ['OPENAI_LOG_FORMAT] = 'stdout,tensorboard') and I can get those to be listed on print(os.environ), but I am not getting any file outputs. Any ideas?

KomputerMaster64 commented 2 years ago

I just wanted to know how to move the logs from /tmp directory to directory of choice, as I have to manually save the /tmp/openai-2022..... files to get the checkpoints for training.

PS I am using multiple gpus for training, hope that suggested methods works for multi gpu training

Nara0731 commented 2 years ago

你好，我已收到你的来信。若有重要事情，请短信告知！

openai / baselines

About Tensorboard #596