Voice Notes, Thumbnail Photos, and Large Binary Values

Devices targeted at active recording typically allow the user to record an audible-range voice note to accompany the ultrasonic bat recording. The spec currently doesn't define a top-level field for voice note; should we do so?

While voice notes are likely of a lower samplerate (eg. 44.1kHz), they may be even longer in duration than the actual bat recording, so the voice note could easily exceed a few mb in size (60 seconds of 16-bit 44.1kHz mono .WAV is ~5mb in size).

Should voice notes be embedded as a base64 binary field value?

Should we instead define a second chunk gbin for housing large binary "attachments", then reference them with a "pointer" inside the main guan chunk? With this strategy, reading implementations won't need to allocate memory and resources for reading these potentially large attachments unless they recognize that they want to. Additionally, by storing pure binary data in gbin a writing implementation won't need to base64 encode the data.

This issue applies not only to voice notes, but also thumbnail images (of rendered spectrogram, etc.), or any other "large" metadata value.

riggsd / guano-spec

Voice Notes, Thumbnail Photos, and Large Binary Values #2