Add C++ plugins framework

ggarber commented 10 months ago

Feature Request (more like a feedback request)

Protoype: https://github.com/ggarber/mediasoup/commit/14a3ba052aad4a19a2d34533573e144afec7dbef

Some use cases require custom processing of Media and Data packets and because of the performance implications it is not always possible to forward to those packets in real time from C++ to JS and then back from JS to C++ using DirectTransport capability.

Some examples of those use cases could be:

Store RTP packets for all the producers in some pcap like format like Janus MJR format.
Implement a custom datachannels protocol where the destination of each message is included in the message itself.

Nowadays the solution in these cases is to use an additional server and forward those messages from mediasoup to this server but that adds some extra complexity in terms of orchestration and also maintaining that additional server. The other solution today is to use Rust instead of JS library but that's not an option for all the teams.

The alternative is to provide support to extend mediasoup C++ layer with some custom code (let's call it plugins for now) so that developers can extend the standard capabilities of mediasoup with their use case specific logic.

The main components of that approach would be: A) The API exposed by mediasoup to those plugins. B) How those plugins are loaded.

API exposed to those plugins

It needs to be a mix of methods and events.

Methods: Initially all the public interface of mediasoup objects (Transports, Consumers, Producers... ) could be used from plugins but some interfaces could be extracted (TransportApi, ConsumerApi, ProducerApi...) so that plugins don't see the whole interface.

Events: mediasoup needs to expose a mechanism for plugins to listen for events. This can be in the form of events that plugins can be attached for (see prototype).

Loading of plugins

There are different alternatives:

Fork mediasoup and add plugins to the repository. This is far from ideal but a potential solution for the short term and initial development and testing of the API.
Use mediasoup as a library and generate your own mediasoup worker binary with that library and their own plugins. This looks too complex for most of developers but maybe it is reasonable in some cases.
Add some simple dynamic loading capabilities. Mediasoup JS API could add a method worker.loadPlugins('xxxx.so', 'yyyy.so').

jmillan commented 10 months ago

I like the approach. Something that worries me is the synchronous nature of the implementation. Plugins processing impacts directly on worker performance (not in this specific 'ping' example but potentially in any others). I would prefer an async framework where the communication between the plugin and the worker is made by libuv communication means. Ie: the same way Rust communicates with the worker.

NOTE: I know how making this async would complicate things.., so I wonder if it would be acceptable to let the plugin user assume the performance cost of the plugin.

NOTE2: Another worrying point is the fact that everything runs in the same process, meaning a crash in a plugin would crash the worker.

ggarber commented 10 months ago

Thx a lot for taking the time to look at this.

NOTE1: Makes sense. I was considering that any plugin that could affect the performance of the main event loop would create its own threads (f.e. a plugin doing audio decoding). So maybe the plugin API doesn't need to be more complex if most of plugins would be simple and rest are able to create their own processing threads. I'm not sure.

There is a feature that it is hard to achieve with asynchronous processing that is having plugins that intercept the processing of a packet so that it doesn't continue normally. For example a plugin throttling the RTP packets received so that nobody can send more than X pkts/sec to avoid DoS attacks.

NOTE2: Correct, but I think that's common to most of the plugins systems, right? (nginx, apache, janus, gstreamer...). Given that the developer would decide what plugins to use I think it could be reasonable.

vpalmisano commented 10 months ago

Maybe the plugin itself could handle the events in an asynchronous way if the processing affects the main loop performance.

ibc commented 10 months ago

I cannot comment much about this yet since I don't know anything about how a native plug-in can be "installed", but from architectural point of view I think that every plug-in should have:

An interface for mediasoup node to dynamically enable/disable it for specific producerIds and/or dataProducerIds.
The plug-in should receive the whole payload (RtpPacket or SCTP message) that the worker receives on those proeucerIds or dataProducerIds and the worker should then DISCARS those messages, this is, if the payload is passed to a plug-in then the worker should drop those RtpPackets or SCTP messages instead of performing normal routing with them (unless the plug-in discards that message, see last bullet).
The plug-in processes the message and, later, it may decide to tell the worker "ey send this RtpPacket or this SCTP message to those consumers ids or dataConsumer ids". This is, an absurd example: a plug-in receives a RtpPacket from producerId 1111 and tells the Worker to send a SCTP message to dataConsumerIds 2222 and 3333.
In case multiple plugins are enabled for the same Producer or DataProducer, we need to run them serially, so first plug-in receives the message first from the worker and decides to handle it or ignore it. The message from the worker to the plug-in contains the producer id and the whole payload and it's a request, and the response from the plugin (true/false) tells the worker whether the plug-in will take care of the message or not. If not, the worker passes the payload to next plug-in or just routes it normally if no plug-in handles the payload. And this is SUPER problematic because it means that received RtpPackets and SCTP messages must remain in memory until that async decision is taken, which means that the C++ Producer or DataProducer class instance cannot call "delete packet" immediately after passing the packet to the Router as we do know. So smart pointers or whatever needed here.

nazar-pc commented 10 months ago

Worker was supposed to be hidden behind TypeScript/JavaScript library, not sure I like the idea of adding C++ plugins. You should really consider Rust if you want performance, then you can still treat worker as opaque implementation, yet achieve higher performance.

ggarber commented 10 months ago

Thank you for your time, see inline.

I don't know anything about how a native plug-in can be "installed"

Check the three alternatives mentioned in the description (Loading of plugins)

An interface for mediasoup node to dynamically enable/disable it for specific producerIds and/or dataProducerIds.

Consider the option of allowing plugins to decide what producers/routers/workers they are interested in, like in the prototype. Potentially a plugin could be used for other logic that is not related to producers or consumers IMO.

The plug-in should receive the whole payload (RtpPacket or SCTP message) that the worker receives on those proeucerIds or dataProducerIds and the worker should then DISCARS those messages, this is, if the payload is passed to a plug-in then the worker should drop those RtpPackets or SCTP messages instead of performing normal routing with them (unless the plug-in discards that message, see last bullet).

The way I have implemented this in other SFUs is with the plugins returning a flag (CONTINUE, BREAK...) to tell the next plugin or logic in the chain if the processing should continue or not and the default is to continue with normal processing.

The plug-in processes the message and, later, it may decide to tell the worker "ey send this RtpPacket or this SCTP message to those consumers ids or dataConsumer ids". This is, an absurd example: a plug-in receives a RtpPacket from producerId 1111 and tells the Worker to send a SCTP message to dataConsumerIds 2222 and 3333.

This flow is covered in the prototype, right?

In case multiple plugins are enabled for the same Producer or DataProducer, we need to run them serially, so first plug-in receives the message first from the worker and decides to handle it or ignore it. The message from the worker to the plug-in contains the producer id and the whole payload and it's a request, and the response from the plugin (true/false) tells the worker whether the plug-in will take care of the message or not. If not, the worker passes the payload to next plug-in or just routes it normally if no plug-in handles the payload. And this is SUPER problematic because it means that received RtpPackets and SCTP messages must remain in memory until that async decision is taken, which means that the C++ Producer or DataProducer class instance cannot call "delete packet" immediately after passing the packet to the Router as we do know. So smart pointers or whatever needed here.

This could be another reason to follow the sync approach instead of the async one, right?

ggarber commented 10 months ago

Worker was supposed to be hidden behind TypeScript/JavaScript library, not sure I like the idea of adding C++ plugins. You should really consider Rust if you want performance, then you can still treat worker as opaque implementation, yet achieve higher performance.

I agree it is a paradigm change where the mediasoup API is not only the JS API but also the C++ API. I like that and ideally would allow me to run mediasoup in my own C++ process without any Javascript :) but I understand it is a big change and makes the API surface much bigger.

I included Rust as an option in the Issue description and I think it is the best approach in many cases. My "problem" is that for many teams rust is not the most convenience language and is an entry barrier if to create a simple 100 lines plugin they need to rewrite 20.000 lines of javascript code.

nazar-pc commented 10 months ago

For context I was hoping we can eventually have more Rust in mediasoup and use https://github.com/neon-bindings/neon or something similar to create a Node.js module. Having C++ plugins support would mean more breaking changes to downstream users or more work for us to continue supporting it.

Not saying it is going to happen any time soon, but I'm hopeful that it might happen eventually.

ibc commented 10 months ago

Check the three alternatives mentioned in the description (Loading of plugins)

Of course I only consider the "add some simple dynamic loading capabilities" approach. Others are not plug-ins at all and we need something that should work with npm i mediasoup installation approach. So unless I miss something let's please ignore any other alternative.

Can your prototype be used as follows?

You have /Users/vox/src/myMediasoupFooPlugin.cpp.
I assume you must compile it and get /Users/vox/src/myMediasoupFooPlugin.so file.
Then you have /Users/vox/src/myMediaServer Node project in which you install mediasoup npm i mediasoup as usual.

In myMediaServer/src/index.ts you use the mediasoup Node api to create a Worker and enable your plugin on it:

import { createWorker } from 'mediasoup';

const worker = await createWorker({
plugins: [
  { id: 'foo', path: '/Users/vox/src/myMediasoupFooPlugin.so' }
],
etc
});

Then you create a Router in which you want to enable your plugin:

const router = await worker.createRouter({
plugins: [ 'foo' ],
etc
});

Then you create a Router in which you want to enable the Router related stuff your plugin:
```
const router = await worker.createRouter({
plugins: [ 'foo' ],
etc
});
```
Then you create a WebRtcTransport in which you want to enable the Transport related stuff your plugin:
```
const transport = await router.createWebRtcTransport({
plugins: [ 'foo' ],
etc
});
```

Then you dynamically enable/disable your plugin:

transport.enablePlugin('foo');
transport.disablePlugin('foo');

Then you launch your app and things just work without having to do anything else.
```
node /Users/vox/lib/myMediaServer/index.js
```

An interface for mediasoup node to dynamically enable/disable it for specific producerIds and/or dataProducerIds.

Consider the option of allowing plugins to decide what producers/routers/workers they are interested in, like in the prototype. Potentially a plugin could be used for other logic that is not related to producers or consumers IMO.

Not against this.

The plug-in should receive the whole payload (RtpPacket or SCTP message) that the worker receives on those proeucerIds or dataProducerIds and the worker should then DISCARS those messages, this is, if the payload is passed to a plug-in then the worker should drop those RtpPackets or SCTP messages instead of performing normal routing with them (unless the plug-in discards that message, see last bullet).

The way I have implemented this in other SFUs is with the plugins returning a flag (CONTINUE, BREAK...) to tell the next plugin or logic in the chain if the processing should continue or not and the default is to continue with normal processing.

Yes, that's basically what I meant.

The plug-in processes the message and, later, it may decide to tell the worker "ey send this RtpPacket or this SCTP message to those consumers ids or dataConsumer ids". This is, an absurd example: a plug-in receives a RtpPacket from producerId 1111 and tells the Worker to send a SCTP message to dataConsumerIds 2222 and 3333.

This flow is covered in the prototype, right?

Yes.

In case multiple plugins are enabled for the same Producer or DataProducer, we need to run them serially, so first plug-in receives the message first from the worker and decides to handle it or ignore it. The message from the worker to the plug-in contains the producer id and the whole payload and it's a request, and the response from the plugin (true/false) tells the worker whether the plug-in will take care of the message or not. If not, the worker passes the payload to next plug-in or just routes it normally if no plug-in handles the payload. And this is SUPER problematic because it means that received RtpPackets and SCTP messages must remain in memory until that async decision is taken, which means that the C++ Producer or DataProducer class instance cannot call "delete packet" immediately after passing the packet to the Router as we do know. So smart pointers or whatever needed here.

This could be another reason to follow the sync approach instead of the async one, right?

Imagine you want to apply a lot of IA into every RTP packet. This is probably async stuff. I wouldn't assume sync comm.

ggarber commented 10 months ago

Imagine you want to apply a lot of IA into every RTP packet. This is probably async stuff. I wouldn't assume sync comm.

My example was decoding audio and it is similar to yours. In that case the plugin can create any threads that it needs to process the rtp packets and not block the mediasoup event loop.

ggarber commented 10 months ago

For context I was hoping we can eventually have more Rust in mediasoup and use https://github.com/neon-bindings/neon or something similar to create a Node.js module. Having C++ plugins support would mean more breaking changes to downstream users or more work for us to continue supporting it.

I agree it would mean more breaking changes. I think at least initially the plugins feature should be used only for very advanced use cases / developers willing to pay the extra effort and risk of breaking changes.

It could potentially be enabled during builds (#ifdef PLUGINS_SUPPORT) and only people really needing it builds its own version of mediasou pwith that support enabled.

jmillan commented 10 months ago

Something that worries me about the approach in your example is that there are no boundaries about what a plugin can do as it receives the core instances. It could, for example, call transport.CloseProducersAndConsumers() and leave the Node and Rust side in an inconsistent state.

We already have a communication framework between the core (Worker) and Node and Rust. Such framework provides a control and data message format (JSON in v3, flatbuffers very soon in v3) as well as the means to handle those in a secure way.

There are currently 2 ways we exchange those control and data messages (defined in ChannelSocket):

UnixSocket in Node
As a function call, via FFI in Rust using uv_async_send()

IMHO we should use this very same framework (reusing the same message formats) by adding a third way to exchange these messages between the core and, now, C++:

As a direct function that the worker uses to send the message to the plugin manager, sync.

If there was the need to send async messages from plugins to the core it could use uv_async_send() since, obviously it cannot just run stuff out of the loop.

[*]: New messages would be needed to be created as for example notifications to tell the plugin manager that a new transport, [data]consumer, [data]producer, etc has been created, etc.

ibc commented 10 months ago

@jmillan the thing here is that worker plugins should just run on C++ without interacting with the Node/Rust layer. I agree that they should NOT be able to close transports or producers or consumers since that would ruin the whole state, but this is about limiting the API exposed to them. Note that, even if we force plugins to communicate with the worker via uv_async, same problem would happen if we allow the plug-in to close a producer directly in C++, so I don't see any benefit on forcing plugins to use flatbuffers (AKA a Channel) to communicate with worker. We use a Channel to communicate Node and Worker because those are separate processes, but why should make plugins run into a separate process? They could be native threads instead and hence they share memory with the Worker.

Imagine a plug-in that receives RTP packets and tells the worker to NOT relay original packets. Instead, the plug-in collects some received packets, transcodes them and eventually tells the worker to send a new packet to X consumer. We need av_async since the plug-in may need to send that new packet at any time (and not as a result of a packet received from the worker), and we don't want the plug-in to have direct access to the entire consumer. So we can just provide the plug-in with some limited API such as sendRtpPacket(packet, consumerId).

jmillan commented 10 months ago

the thing here is that worker plugins should just run on C++ without interacting with the Node/Rust layer.

Of course, this communication is about worker and plugins only. I was exposing a way to communicate both: We already expose a command (request/notification) based framework to do anything, including sending RTP, RTCP, DATA, etc. Right now only Node and Rust use it. Such framework is implemented in HandleRequest() and HandleNotification() in every class that supports such controls. This way Node/Rust or a hypothetical C++ plugin do not have access to the internal data, but refers to it via IDs and other parameters, assuring the overall consistency. Of course sending data from worker to the plugin would be sync, by simply calling a method on the extension manager.

I agree though that there is no need to serialize data we are passing to the same process, but I want to reiterate the importance of limiting the access of a plugin to the overall instances.

ggarber commented 10 months ago

Something that worries me about the approach in your example is that there are no boundaries about what a plugin can do as it receives the core instances. It could, for example, call transport.CloseProducersAndConsumers() and leave the Node and Rust side in an inconsistent state.

Is the proposal in the feature description enough to address this?:

"Methods: Initially all the public interface of mediasoup objects (Transports, Consumers, Producers... ) could be used from plugins but some interfaces could be extracted (TransportApi, ConsumerApi, ProducerApi...) so that plugins don't see the whole interface"

jmillan commented 10 months ago

Is the proposal in the feature description enough to address this?:

Definitely a reduced interface should be exposed, yes.

ibc commented 10 months ago

I agree though that there is no need to serialize data we are passing to the same process, but I want to reiterate the importance of limiting the access of a plugin to the overall instances.

But this is not in sync with what I said, which was:

Even if we use a channel to communicate worker and plug-in, if we let the plug-in close a Consumer the app state will be ruined.
We don't want to serialize or copy memory because those plugins can run in same process in separate threads.

So how does your comment match those requirements that I thought we had already agreed on?

I insist: I don't want to pass Consumer instances to plug-ins. So we don't pass them, that's all. It doesn't mean that the only approach to satisfy it is by using a channel between plugins and worker.

And BTW, if the communication between plug-in and worker is async (it cannot be sync at all) then communication between worker and plugin must also be async, otherwise the plug-in may run two block of code at the same time, something that we probably don't want. But this just means that we use uv_async in both directions and we make plugins work within a separate libuv loop, that's all.

jmillan commented 10 months ago

There is a misunderstanding:

Even if we use a channel to communicate worker and plug-in, if we let the plug-in close a Consumer the app state will be ruined.

Of course a plugin would have a subset of requests/notifications, as it cannot close a consumer for consistency.

We don't want to serialize or copy memory because those plugins can run in same process in separate threads.

I already agreed.

It doesn't mean that the only approach to satisfy it is by using a channel between plugins and worker.

I didn't said that this was the only approach. I was justifying why I did propose it in the first place.

To be clear. I'm not advocating for serializing data into any format or using a specific communication channel.

ggarber commented 10 months ago

I would suggest to pick 3 relevant use cases and write the plugins (it can be pseudocode) to see how it looks like with each approach. Some examples that could cover a large spectrum of use cases could be:

Custom datachannels protocol where each message includes the destination in its payload. (interesting because requires coordination of multiple peers/transports)
RTP throttling where packets are discarded if too many are received. (interesting because requires discarding packets)
Applying audio effects to audio streams. (interesting because it is too expensive to do it synchronously)

There are a couple of things suggested by @ibc (async communication or no access to consumers) that I'm not sure how compatible they are with those use cases and I think that could help to clarify it.

I'm happy to write the pseudocode for those selected use cases using my proposed approach if that helps.

jmillan commented 10 months ago

While I see the benefit of having a plugin system in mediasoup I need to add some points around it:

We do offer a control layer on top of the worker in TS, Rust (and there is also a Go implementation).

All those communicate with the same C++ worker layer.
Meaning mediasoup is used in TS, Rust and Go projects.
- TS layer communicates with C++ via unix socket, it's a bidirectional async communication.
- Rust layer:
Initially communicated with C++ via unix socket, in async way.
Currently communicates with C++ via Rust FFI. Calls the worker in async mode (uv_async_send) and it's called by worker in a sync way (the worker calls a function provided by Rust).
- This means all the plugins exposed here could already be written in Rust since the communication method worker->rust could return a value indicating whether the packet must be dropped, or change the given buffer, etc.

What would make the worker agnostic of the language the plugins are written and hence do not discard the possibility to write the plugins in Rust in a future?

If a project using TS mediasoup wanted to write a Native plugin we would first need to change how we launch the worker and communicate with it in a similar way we do in Rust. Not by spawning a process as we do in TS but by creating a thread. This would be done from a native Node module written in C++. Once the thread is created the communication from the Node module to the worker is done via uv_send_async and the communication from the worker to the native module would be via FFI having the worker call a method that the native module has previously registered. I'm just describing what we do in Rust.

In essence, having the C++ worker launched from a native C++ Node module would have the following benefits:

Communication would be times faster than using a unix socket.
- Indeed worker->TS would no need a memcpy since the native module would receive a pointer to the buffer owned by the worker.
The plugins we are talking about could be made in C++, natively, but in the Node native module land so would not be implemented in the Worker, hence keeping the door open to Rust based products to write their plugins in that language.
Everything is ready in the Worker since it was prepared in order to make the Rust layer work!

Of course that requires a previous work of launching and communicating with the Worker via a native module in Node. Let's keep this in mind, and if for some reason, we implement the plugin in the worker-land now, let's be ready to remove it once we are able to do it the other way as it seems to be the way to go in my opinion. Am I alone in this thinking?

ggarber commented 9 months ago

Thx @jmillan for taking the time. Some questions for me to understand better the proposal:

1/ With that approach how is my C++ plugin class going to connect to the rest of mediasoup? What would be those methods "that the native module has previously registered"? Do they exist already? For example if my plugin needs to receive some datachannels messages how would that code look like?

2/ I guess the mediasoup-worker (not the libmediasoup-worker) is the equivalent of that node native module today, right? Why can't it be integrated there then?

FWIW I would consider the difference between languages that can "control" mediasoup (like Go, Rust or Javascript) and languages that you can use to extend mediasoup (like C++ and Rust). For example in Redis you can only write modules in C even if you have redis libraries/connectors for any language.

Note: We couldn't wait so we are already using the plugins approach described initially so happy to provide feedback from real usage if needed at some point.

ibc commented 9 months ago

2/ I guess the mediasoup-worker (not the libmediasoup-worker) is the equivalent of that node native module today, right? Why can't it be integrated there then?

https://mediasoup.org/faq/#is-mediasoup-a-native-addon

jmillan commented 9 months ago

I've been giving this a second thought and it seems perfectly fine having a plugin system as it is suggested here[*], which is how, AFAIK, they are typically built.

[*]: I mean having a plugin system in the core of the worker that calls the methods of the loaded plugins.

I've probably been distracted by the fact that the example plugin given here adds code inside the worker, while a real plugin should be able to be loaded as a .so library. Such library can be implemented in C++, Rust or other language of choose that allows it.

Apart from that, the plugins should have a way to be controlled, ie: enable for Producer X only, and such, and that was already commented here in this comment by @ibc https://github.com/versatica/mediasoup/issues/1175#issuecomment-1764611913.

I like that and ideally would allow me to run mediasoup in my own C++ process without any Javascript

Is this your ultimate goal for the plugin system?

versatica / mediasoup