I found the explanation of the coded number unclear; I had to read the reference implementation to understand it. Some things that would make the specification clearer to me:
Clarify that the decoded sample/frame number is effectively the UTF-8 code point, and that we aren't storing a textual representation of the sample/frame number.
Give an example of how an unencoded number maps to coded bytes. Then the reader gets to see how the "extended" nature of this coding scheme actually looks. I suggest the all-ones "maximum of 36 bits unencoded".
Upon having a closer look, it seems a reference to UTF-8 is missing altogether. Probably a good idea to reference ~RFC 2277 (BCP 18)~ RFC 2279 and RFC 3629. I will take a look at this.
I found the explanation of the coded number unclear; I had to read the reference implementation to understand it. Some things that would make the specification clearer to me: