Histogram binning and plotting useless and deceptive in some cases

mkillinger commented 5 years ago

First of all, I think Tensorboard is really great and important UI for real-time feedback on model training, but I run into a few issues with accuracy of the displayed information.

There are two related "sub-issues", the first is much more important:

It seems there are cases of distribtutions for which the resulting histogram bins don't show any discriminating information any more. Example: True distribution, computed offline: Tensorboad histogram: The visualization is exactly flat/uniform which would indicate a serious issue with training the model. However, the actual distribution is more or less Gaussian around 0.97, so in that sense the tensorboard histogram is deceptive and not showing any useful information. I could imagine that this issue is caused by the fact that the variance is small compared to the mean - but a reasonable scaling/binning method should be robust against that, it's not like the absolute values in this example are completely ill-posed.
This issue concerns the "style" of the visualization. If it's supposed to be a histogram, why is it shown as piece-wise linear connected lines and not as a bar plot? If a continuous representation is preferred for aesthetic or other reasons, the right thing would be a kernel density estimation (which for scalar data should not be super expensive). But a piece-wise linear visualization is confusing and introducing strange artifacts. One of this artifacts can be seen when using the "overlay" mode for showing the time evolution of histograms: The alternation of constant/horizontal sections and oblique interpolation lines creates a lot of visual clutter that makes it hard to see the actual time evolution of from one histogram to the next. A smooth KDE visualization would presumably be much easier to visually parse, but a plain bar plot might also be easier to understand due to it's simplicity.

A side note, unrelated to histograms: The moving average smoothing function for scalar data should convert the user-provided raw smoothingWeight to per-time-series parameter that takes into account the sampling rate of each time-series displayed. The training loss sampling rate is often orders of magnitude larger than the sampling rate of evaluator jobs. It is absolutely counter-intuitive to use the "point-to-point" moving average with the same smoothingWeight for those different time-series. The result of doint that is that when the smoothingWeight is set to a value that sufficiently suppresses noise/variance in the training loss, then the evaluation loss is lagging behind so much that the end value of the smoothed evaluation loss is way off it's true value due to the lag: The smoothed final value of the evaluation loss at 20M training steps corresponds to the "instantaneous" value of the training evaluation loss at ~6M steps. I think this is not the information that one wants to see at 20M steps.

Context: Tensorboard version: 1.13.0a0 TensorFlow version: build from CL/231841759

manivaradarajan commented 5 years ago

Thanks for the bug report. We'll investigate and report back.

mkillinger commented 5 years ago

Any progress? I think a bar plot or KDE visualization would be really advantageous - as of now these histograms are pretty unscientific.

nfelt commented 4 years ago

@mkillinger No progress on this yet I'm afraid, but very much agree that the histogram visualization's binning is a problem (the TF 2.0 histogram op uses linear binning so it shouldn't have quite the same issue as in your post, but can still produce artifacts), and that the plot style is a bit misleading as well.

For the smoothing issue, we know the smoothing algorithm also could use improvement (though it's been historically tricky to get right). If you want to split that out as a separate GitHub issue it would be a bit easier to track that way, or I'm also happy to repost that part of your comment separately if you want.

nfelt commented 4 years ago

Related issue was #1015, about binary histogram binning in particular, which I think we can consider a sub-issue of this problem.

wchargin commented 4 years ago

(since I don’t see it noted here) One option that we had floated in the past was letting tf.summary.histogram actually just sample something like 10 000 points from your tensor and store those in the data file. It’s lossy, sure, but so is any particular binning strategy (linear, exponential), and at least this would be unbiased. From this sample we can always re-bin at read time to show a traditional histogram.

tensorflow / tensorboard

Histogram binning and plotting useless and deceptive in some cases #1803