xiph / opus-tools

A set of tools to encode, inspect, and decode audio in the Opus format.
https://opus-codec.org/
Other
212 stars 78 forks source link

opusenc: add 4x opusenc options #64

Open chemag opened 2 years ago

chemag commented 2 years ago

I also have a patch to force the mode (silk, hybrid, celt, auto), but the OPUS_SET_FORCE_MODE API is private. I can just copy the values, but I assume there's a reason for having it private, and anyway the right solution would be to make it public.

chemag commented 2 years ago

I also have a patch to force the mode (silk, hybrid, celt, auto), but the OPUS_SET_FORCE_MODE API is private. I can just copy the values, but I assume there's a reason for having it private, and anyway the right solution would be to make it public.

I managed to do this (dirty way, just hard-coding the FORCE_MODE values), but it required changes in opus-tools and libopusenc. A clean approach will also require moving the FORCE_MODE values in the opus repo from src/opus_private.h to include/opus.h

mark4o commented 2 years ago

Thanks. I'm not sure that these options make sense for opusenc. For example FEC is for handling packet loss, which cannot happen in an Ogg container since it is stream based; packets are not sent individually. Similarly DTX allows for not transmitting some packets, which is also not possible in Ogg. Options that don't make sense for opusenc just confuse users and make the program more difficult to use. Can you explain the use case for adding these to opusenc?

As for choosing the encoding mode, the --music and --speech options are much better ways to do that, because they will consider the other settings and which settings are supported by each mode. For example only CELT mode supports frame sizes smaller than 10 ms, so when smaller frames are required it will automatically use CELT mode. The internal force mode setting is used internally by code that has already considered the other settings and chosen a mode that is valid for those settings.

chemag commented 2 years ago

Hi, mark4o, thanks for the review.

Thanks. I'm not sure that these options make sense for opusenc. For example FEC is for handling packet loss, which cannot happen in an Ogg container since it is stream based; packets are not sent individually. Similarly DTX allows for not transmitting some packets, which is also not possible in Ogg. Options that don't make sense for opusenc just confuse users and make the program more difficult to use. Can you explain the use case for adding these to opusenc?

I'm running some experiments to measure the effect of Opus settings on bitrate. I saw the disparity in settings available to opusenc and opus_demo, and I tried to make them the same. As you mention, some of the settings (FEC, DTX) make no sense once you encapsulate the opus output in ogg. The encoding mode patch is probably too hairy right now. Does it make sense to add the other 2x settings to opusenc (application and bandwidth)?

Also, I added signal control to opus_demo (see https://github.com/xiph/opus/pull/233).

vadimkantorov commented 1 year ago

Sorry, it's a novice question: will forced bandwidth in all packets lead to decoding in that sample rate? The usecase is storing speech recognition dataset as opus files (for space saving). Then it's useful to be able to decode in the raw sample rate

mark4o commented 1 year ago

@vadimkantorov The encoder bandwidth does not affect the decoding sample rate. Using the opusdec program you can decode at any sample rate using the option --rate n.

vadimkantorov commented 1 year ago

I guess the reasonable feature request is then to ask for an option that would take the decoding sample rate from the informational OpusHead packet in one go (and if it doesn’t exist yet, I’ll check first :) - for a function in API to very fast read only this packet). This sort of functioning is useful because the file doesn’t have any meaningful frequency content besides the raw input sample rate (and then applying the input bandwidth with the option from this PR).

I’ll create a new issue for this.

Thank you!

mark4o commented 1 year ago

The original sample rate in the header is already the default decoding sample rate in opusdec.

vadimkantorov commented 1 year ago

Some way for forcing SILK/CELT is useful when evaluating opus, be it --speech or some other option. Is there currently any way to force SILK or CELT in the released opusenc? I've tried passing --set-ctl-int 4000=2048 (OPUS_SET_APPLICATION_REQUEST=OPUS_APPLICATION_VOIP) (following defines in https://github.com/xiph/opus/blob/master/include/opus_defines.h), but I'm not sure if it had any effect. Should I also add --set-ctl-int 4024=3001 (OPUS_SET_SIGNAL_REQUEST=OPUS_SIGNAL_VOICE)? or also --set-ctl-int 11002=1000( OPUS_SET_FORCE_MODE_REQUEST=MODE_SILK_ONLY https://github.com/xiph/opus/blob/master/src/opus_private.h if will not be rejected by frontend)

Along the same lines, some option for debug/verbose print-out with information which codec is being chosen and maybe some other details would be useful too.

vadimkantorov commented 7 months ago

@mark4o For DTX, is not transferring silence frames possible with some other containers like WebM or MKV? https://github.com/xiph/opus-tools/issues/49

Is there any other container that you can recommend for saving space on skipping long silence?

Do I understand you correctly that the standard .ogg container / demuxers can't handle missing silence/DTX frames in non-realtime mode? Because from the RFC standard it seems that it might https://datatracker.ietf.org/doc/html/rfc7845#section-4.1, I'm not understanding it probably :(