janosh / tensorboard-reducer

Reduce multiple PyTorch TensorBoard runs to new event (or CSV) files.
https://pypi.org/project/tensorboard-reducer
MIT License
68 stars 4 forks source link

Some tags only contain one data point #31

Open Seraphli opened 1 year ago

Seraphli commented 1 year ago

tb-reducer ant_sac/*seed*/log/serial/ -o out -r mean,std,min,max --lax-steps Although I used --lax-steps flag, some tags only contain one data point. Others are fine. image image collector_step/reward_mean is actually containing more data points than evaluator_step/reward_mean

janosh commented 1 year ago

Can you share the log files so I can take a closer look?

Seraphli commented 1 year ago

Yes, you can download the logs from here. https://transfer.sh/JDnbtA/ant_sac.tar.gz

janosh commented 1 year ago

Hmmm... Could you post your versions for Tensorboard Reducer, TensorBoard, and PyTorch/TensorFlow? Running the command you posted above

tb-reducer ant_sac/*seed*/log/serial/ -o out -r mean,std,min,max --lax-steps

on the data you shared, my TB dashboard looks very different.

Or maybe try reloading the dashboard a few times. I've known it to be a bit laggy in displaying all available data.

Screenshot 2023-03-10 at 07 17 56

Seraphli commented 1 year ago

As I said, it has problem with the tag collector_step/reward_mean, but not evaluator_step/reward_mean torch 1.13.1 tensorboard 2.12.0 tensorboard-data-server 0.7.0 tensorboard-plugin-wit 1.8.1 tensorboard-reducer 0.3.0

janosh commented 1 year ago

Oops, my bad. Didn't read carefully enough. I'm not fully awake until I've had my 2nd tea or coffee. 😄

I took another look and I think the problem is some of the runs are empty, resulting in those steps being filtered out.

If you re-run your command with --min-runs-per-step 1

tb-reducer ant_sac/*seed*/log/serial/ -o out -r mean,std,min,max --lax-steps --min-runs-per-step 1 --overwrite

it does include many more steps in the reduction but it gives this jumbled mess which isn't very helpful either.

Screenshot 2023-03-10 at 09 59 22

I'll have to dig deeper to see why that happens but maybe you can try removing/excluding the empty event files as workaround in the meantime.

Seraphli commented 1 year ago

Actually, all logs that match ant_sac/*seed*/log/serial/ are not empty. You can remove all logs in the buffer subfolder and get the same results. I think again about this issue. It might be caused by the different logging time steps of the tag collector_step/reward_mean. The data points of the tag evaluator_step/reward_mean is always logged at the same time steps, while the data points of the tag collector_step/reward_mean is logged at random time steps. Is there a way to use some interpolations like linear interpolation or spline interpolation when reducing the logs?

janosh commented 1 year ago

Sorry for the long radio silence.

The data points of the tag evaluator_step/reward_mean is always logged at the same time steps, while the data points of the tag collector_step/reward_mean is logged at random time steps.

Ah, that could be it. Glad you figured it out.

Is there a way to use some interpolations like linear interpolation or spline interpolation when reducing the logs?

I haven't looked into that but happy to take a PR for this feature. You may be able to use the pandas interpolate method for this.