xiph / opus-tools

A set of tools to encode, inspect, and decode audio in the Opus format.
https://opus-codec.org/
Other
212 stars 78 forks source link

how to read the original data of opus file into python/numpy array? #76

Open yongxuUSTC opened 1 year ago

yongxuUSTC commented 1 year ago

I am not familiar with the format of opus file (after I opusenc in.wav out.opus). How many bytes are in the head of out.opus? Are the rest all actual audio encoded data (2 bytes for each)? Are there any other non-data bytes in the tail?

Could any one can post a sample code to help me to read the out.opus into python/numpy array with Uint16 format? I do not need to decode the out.opus file, I just need the original encoded data in the out.opus file.

Thank you very much

rillian commented 1 year ago

It's not as simple as header and tail bytes. The compressed opus audio packages are split into segments which are grouped with periodic headers including timestamps for seeking. The format is documented in

I looked briefly and didn't find a pure python library that mentioned access to the raw encoded data, although several will decode opus files to pcm audio in numpy arrays.

You could however use the pyogg.ogg ctypes wrapper to access the libogg C implementation directly and pull the data out that way. It's also possible some of the higher-level python libraries have accessible decapsulation functions.

But you mention Uint16 which is confusing. That doesn't make a lot of sense for the compressed opus-encoded data, which is a complex, entropy-coded data structure packed into bytes.

yongxuUSTC commented 1 year ago

It's not as simple as header and tail bytes. The compressed opus audio packages are split into segments which are grouped with periodic headers including timestamps for seeking. The format is documented in

I looked briefly and didn't find a pure python library that mentioned access to the raw encoded data, although several will decode opus files to pcm audio in numpy arrays.

You could however use the pyogg.ogg ctypes wrapper to access the libogg C implementation directly and pull the data out that way. It's also possible some of the higher-level python libraries have accessible decapsulation functions.

But you mention Uint16 which is confusing. That doesn't make a lot of sense for the compressed opus-encoded data, which is a complex, entropy-coded data structure packed into bytes.

Thank you very much for your reply. Is it possible to get the discrete representation (index or int ?) of the codes in the opus file? Just like, nowadays, the neural network based codec (e.g., soundstream https://arxiv.org/abs/2107.03312) can produce the discrete representation through RVQ.