Open gavv opened 3 months ago
Hi, I am interested in solving this issue.
Welcome, thank you!
Let me know if you'll need help/hints with the build system.
Also, if libogg dependency can be disabled, I suppose we should do it, because all we need is vorbis encoder/decoder, we don't need to work with ogg files.
Hi @gavv ,
I tried without the libogg, but it fails when installing vorbis. According to the README, "You'll need libogg (distributed separately) to compile this library.", which is why I tried to install it before libvorbis.
I managed to install vorbis with the build script after installing libogg. But I'm having trouble in installing the libogg with the script, since I used "make install" on the terminal. Using "execute(ctx, 'make install')" on the script need permission on CI, as it creates the files at " /usr/local/lib". So how can I install libogg within the same folder and utilize the generated files for the libvorbis installation?
Oh, I see. I assumed that libogg should support using libvorbis as codec, but instead, libvorbis supports using libogg for muxing.
For rfc5215, we'll need to disable ogg muxing, which seems to be feasible. libvorbis encodes packets into a struct from libogg (ogg_packet
), however this struct does not yet have OGG headers, only opaque codec-specific data. OGG headers are added later when you use libogg API, and we can just don't use it and use data from ogg_packet directly.
When build-3rdparty.py
builds a package, it never runs make install
. This script is for building dependencies locally, not installing them system wide. Instead, in the end of building every package, there are calls to install_tree()
and install_files()
functions that install headers and libraries into a local directory specific for this package.
For example, here is what is "installed" for json-c:
tree -L 2 build/3rdparty/x86_64-pc-linux-gnu/clang-14.0.6-release/json-c-0.12-20140410/
build/3rdparty/x86_64-pc-linux-gnu/clang-14.0.6-release/json-c-0.12-20140410/
├── include
│ ├── doc
│ ├── tests
│ ├── arraylist.h
│ ├── bits.h
│ ├── config.h
│ ├── debug.h
│ ├── json_config.h
│ ├── json_c_version.h
│ ├── json.h
│ ├── json_inttypes.h
│ ├── json_object.h
│ ├── json_object_iterator.h
│ ├── json_object_private.h
│ ├── json_tokener.h
│ ├── json_util.h
│ ├── linkhash.h
│ ├── math_compat.h
│ ├── printbuf.h
│ └── random_seed.h
├── lib
│ └── libjson-c.a
├── src
│ ├── json-c-json-c-0.12-20140410
│ └── json-c-0.12-20140410.tar.gz
├── build.log
└── commit
These include
and lib
directories are added to the include and library search path in scons, so we can use the dependency in our code.
OK, so what we need to do if we need to build libogg and then pass it as a dependency to libvorbis?
Here is an example of similar situation:
Here, ltdl and json-c are dependencies of pulseaudio. We first build dependencies, and then pass them via deps
argument to BuildThirdParty(). BuildThirdParty, in turn, passes them to build-3rdparty.py
script, where those dependencies are stored in ctx.pkg_deps
.
When we call format_flags(), it iterates all those dependencies (if they're present) and add their include and lib directories to the flags.
Summary:
deps
argFeel free to ping me if you'll have troubles with those scripts, I know it's not the funniest part :)
Hi @gavv , I am working on the handling of Vorbis stream headers. When encoding with vorbis, we should have 3 headers: the identification, comments and codebook headers. And have a few doubts on how to deal with this:
Should we have a dedicated frame for headers or should we split the header across multiple frames together with the data? Or do you have another idea?
My idea would be to create a function uint8_t* create_header_frame();
to handle the header separately, and execute this before the begin_frame
How should we handle cases where the header size exceeds the frame size? Could split the header through multiple frame cause any problems?
Does the encoded_byte_count
function need to include the size of the header, or should it only count the encoded data?
I would need to adjust all the unit tests, as well as the packetizer uses the IFrameEncoder
. Does this affect any other part of the code?
Hi, sorry for delay! I was busy relocating to another country. Here are my thoughts, after reading your comments and checking out RFC and libvorbis.
First of all, in your current PR you're using ogg stream and serializing ogg pages. As mentioned above, we shouldn't do it, as specified in rfc5215. We must only send vorbis packets, and don't add ogg encapsulation.
In my understanding, we'll need to use ogg_packet
struct, but not ogg_stream
and ogg_page
and not any of the ogg_*()
functions. Just send bytes from ogg_packet
.
Regarding the 3 headers, rfc5215 states the following:
In order to set up an initial state for the client application, the configuration MUST be conveyed via the signalling channel used to set up the session. One example of such signalling is SDP [RFC4566] with the Offer/Answer Model [RFC3264]. Changes to the configuration MAY be communicated via a re-invite, conveying a new SDP, or sent in-band in the RTP channel. Implementations MUST support an in-band delivery of updated codebooks, and SHOULD support out-of-band codebook update using a new SDP file. The changes may be due to different codebooks as well as different bitrates of the RTP stream.
In other words there are 3 strict requirements:
So I see the algorithm like this:
Sender:
1. Creates encoder
2. Obtains configuration headers (3 ogg_packet structs) from encoder
3. Serialize them according to section 3.2.1
4. Add serialized configuration to SDP
5. Passes samples to encoder, gets packets, and sends them
Receiver:
1. Creates decoder
2. Gets seriailized configuration from SDP
3. Deserializes it into 3 ogg_packet structs
4. Passes headers to decoder
5. Receives packets, passes to decoder, and extracts samples
In addition, if received packet is in-band configuration header update (section 3.1.1), we should parse it and apply configuration to encoder. I'm not 100% sure how to do it, but likely vorbis_synthesis_idheader() and vorbis_synthesis_restart() may help here.
Note: our own sender won't actually produce in-band configuration header update for now (we don't have a use case for it yet), but to conform RFC and be compatible with non-roc senders, we still should be able to handle them.
The approach above is very close to your idea with get_headers_frame()/initialize_headers(), but instead of byte buffers, we need to work with SDP (since we need codec-agnostic solution).
We can pass sdp::SessionDescription to IFrameEncoder, so that it can add its codec-specific configuration to the SDP. Then we can pass sdp::SessionDescription to IFrameDecoder, and it can read configuration from it. Same as with your approach, this should be done before encoding/decoding first packet.
Note: currently SDP is (partially) implemented in roc_sdp, but not actually used. In the release when we'll enable Vorbis, we'll also add CLI and API options to get/set SDP offer/answer. In later releases we'll add automatic SDP exchange via SAP/SDP and RTSP.
As far as I can tell from docs & sources, libvorbis works this way:
This design places us before a few challenges which I missed before. (BTW changes that I describe below will be useful not just for vorbis, but for many other codecs like Opus).
Please let me know if I'm missing something. Bellow I assume that this is true.
encoded_byte_count() was intended to return exact size of encoded payload (i.e. everything after RTP header, including Vorbis headers and compressed audio). It works fine for PCM, but apparently it doesn't fit codecs like Vorbis with variable-size packets. I guess we should rework encoder and composer interfaces. With libvorbis, we can't know packet size before we actually encode it.
Currently the algorithm is:
We could change it to something like this:
To make it work, we'll need a few changes:
vorbis_info_blocksize() * n_chans * some_coeff
.I suppose we'll also need similar changes in IFrameDecoder.
Since libvorbis is allowed to produce multiple packets for one block, we should reflect it in IFrameEncoder interface as well.
Current approach is:
We could use the following approach:
Note: the reason why we pass packet's buffer to encoder as the first step is to allow simple buffer-less encoders like PCM to use packet's buffer directly without doing extra copy.
Currently audio::Packetizer produces packets which all have exactly same size (defines by encoded_byte_count(samples_per_packet)
). If a packet is going to be shorter than this value because of flush() call, packetizer pads it to have needed size.
This is done so because FECFRAME (the protocol which we use for loss repair) requires that all packets within FEC blocks should have same size. This requirements conflicts with vorbis (and similar codecs) where we don't control packet size.
Unfortunately I don't see a clean solution here, so I suggest the following work around:
We'll need to test this approach to see how stable the packet size would be in practice and how often would it cause receiver to skip loss repair. At the first glance it seems to be a rare event.
Also let me know if you have better ideas!
IFrameEncoder is used only from Packetizer, and IFrameDecoder from Depacketizer. IFrameEncoder and IFrameDecoder currently are implemented only for PCM.
IComposer is used more widely, and have a few implementations (rtp::Composer, fec::Composer, rtcp::Composer). Though, changes in its interface and implementations are much smaller.
See also this page: https://roc-streaming.org/toolkit/docs/internals/packets_frames.html
We have two interfaces IFrameEncoder and IFrameDecoder that are used to encode audio samples into packet payload and decode it back. Currently they're implemented using PcmEncoder and PcmDecoder (for uncompressed PCM).
Now we need to add two more implementations: VorbisEncoder and VorbisDecoder, that will use libvorbis library to do the job.
We also need to add new encoder & decoder to the list of tested codecs in test_frame_encoder_decoder.cpp.