Open lespeholt opened 6 years ago
Images are PNG-encoded, which actually seems fine to me.
Audio, on the other hand, uses raw WAV, which is an enormous waste of space. Our plan is to switch that to FLAC, but to do that we need to get a FLAC encoder into TensorFlow core, and we just haven't gotten around to that.
Regarding columnarity: we're working on that, too. @jart has been working on a project to allow TensorBoard to use a SQL datastore instead of an in-memory datastore; you can read about it in the description of #293. This should scale very well. (See also #92.)
Thank you for the feedback @lespeholt. I'm actively working on a SQL database that does all the things you mentioned. Most of the space saving is probably going to come from reservoir sampling. While a normalized SQL format is able to save on things like tag strings, it does introduce new types of storage overhead that the proto event logs don't have.
The current format used becomes slow in many circumstances:
I suggest that the format is replaced by something that: