Closed xtianus79 closed 1 year ago
Hi @xtianus79, let me try to answer your questions.
Currently, I have a setup where the device would request a "session" and call an API to setup an RTMP stream from a service. The ingest URL is sent back to the device so that it can begin streaming to the event. When that is finished I have another call the device can make to locate the encoded asset via the event and provide a traditional LL-HLS stream endpoints to an end user.
So you have something that works, are you really sure you want to change this? Is it absolutely sure that there's no other way to reach your efficiency/scalability goals than going fullscale WebRTC?
Why I'm saying this: don't use WebRTC unless your number one goal is latency. If no one is going to watch the media stream produced by your IoT device in like 100-500 msec from producing it then it's not worth going down the WebRTC rabbit hole. HTTP streaming is so much simpler and, being HTTP-based, it's fully Kubernetes compatible, which you can quite say about RTP/UDP used by WebRTC (STUNner's main goal is exactly to make it as simple to deal with WebRTC media encapsulations in Kubernetes as it is with HTTP). Here is a super-informative post on your options in this problem space.
That being said, if you decide to convert to WebRTC and you want to keep Kubernetes then STUNner will definitely be a part of your eventual architecture. For that, however, you need to be aware that STUNner is not a media service that can accept streams on its own, it's a modest media gateway that allows you to ingest media streams into Kubernetes that would otherwise be fairly difficult due to the weird media protocol encapsulations used by WebRTC, see more on this here. To put it in another way, you will have to put something behind STUNner to process your streams. Think of STUNner as serving the same purpose as a Kubernetes Ingress, like nginx or Envoy.
Can I use GStreamer to create an RTP stream and pass it directly into STUNner,
You can definitely use GStreamer to generate an RTP stream and you can use STUNner to ingest that stream into Kubernetes. Now what you put behind STUNner is on you. I'd recommend you to look at WHIP, a protocol that is designed exactly to support your use case: media ingestion. See a quick reference implementation here. If you decide to use WHIP then I think Janus would be a solid choice.
Or, do I need to use a middleware WebRTC server such as PION, Janus, or MediaSoup plus STUNner
Exactly.
With the the stream collected by STUNner(for example) can I just IP connect a player, like HLS / DASH to connect to the incoming stream. For some reason I think this is where WHIP/WHAP comes in but I have not a good idea. If it isn't something like an IP connect, then I am assuming it is a very front-end application based connection through a react or angular SPA app that will have to more or less connect to the WebRTC server. In the architecture I reviewed here, headless as an example, the APP was in the same Kubernetes namespace. Is that for a reason? Does it have to be there? Is there something I can deliver in an api to make it more like the WHIP flow or HLS flow of URI ingestion? Afterall it's just a protocol.
I'm afraid I don't quite understand these questions, can you please rephrase?
@levaitamas You hit every nail on the head. That last paragraph was my brain farting. But I think you eloquently answered it via the whip information just above that. Basically. I want to ingest a URL like the RTMP stream creates in the service. For example hey I have a URL for you to stream against -> send to GStreamer -> plays back media from GStreamer. Create a locator HLS stream -> Deliver to front-end.
I appreciate the rabbit hole comments because I felt like that I was about to see what Alice and the Mad Hatter where up to after researching all of this.
My thoughts. I've been looking into Dolby.io (Which uses whip) and Amazon Kinesis. To be honest it seems like Amazon has something useful but not 100% sure. It still seems in beta form even though it is in preview. Seems like POC to me TBH.
With that said. Latency is the concern here. Latency and quality. For VOD that still can sit in the service i've built. In fact, I can raise the 4k passthrough and not even worry about live viewing of it. That will work well for me. The liveness is super important. We were doing a demo for the stream and you have to interact with the video, imagine getting directions to do something actively, but it was painful with that 5 second delay. Worked amazing but the delay was difficult to give "live" instruction. And, that is kind of the point TBH. There are other options however too perhaps.
With that said, I feel like it's worth giving a try. Kinesis tells me that this isn't going anywhere because you can do all types of things with WebRTC including data transfer and that is what Amazon is trying to position it for. Azure too.
So, because these systems don't work well in Kubernetes. And if I where them(Janus, Mediasoup... others) I would go FULL on Kubernetes with like a controller and deployment build. Point and shoot. Adoption would go up 10000% IMHO. I think that I might need the professional help in the setup. The point of the demo is to "wow" and this would be that wow so it might just be worth it.
What do you think I might miss out on by trying to go with a solution that could lead to an unusable system?
The device is controlled and the interface in the web is all a controlled environment. This isn't a situation where someone needs to setup a meeting room or a chat interface to talk like slack or teams. It's just a 1 to 1 maybe 1 to 3 situation. Would that make the implementation safer and more reliable?
What do you think I might miss out on by trying to go with a solution that could lead to an unusable system?
As far as I understand your media path would involve a WebRTC segment from your IoT device to upload the video to a central store probably deployed into Kubernetes to a given URL, and from there a "traditional" HTTP/POST based streaming segment that would allow clients to easily obtain the video from said URL. In the reverse direction, the clients would send control commands via a Web based channel (say, WebSocket).
Now if you want real interactivity (say, <100 msec E2E delay) then I'm afraid you cannot afford the HTTP streaming segment due to the excess latency potentially caused by TCP congestion control and you will have to go down that WebRTC rabbit hole and implement an E2E WebRTC solution. There is a thin line between real-time media (<150 msec delay, as in, say Zoom or Google Meet) and "traditional" streaming (say, < 1-5sec delay, Twitch or Youtube), but that additional couple of seconds time to buffer streaming media is crucial. Traditional wisdom says that if interactivity is the primary goal then you have to design for latency from the outset. I guess you could implement something on top of our one-to-one call tutorial (just substitute Kurento with something that's actually being maintained) and add a WebRTC DataChannel for control, the source is here.
I'm closing this issue for now since this discussion is not really related to STUNner. just drop by at our Discord if you have any further questions, we're happy to help. I'd also like to take the opportunity to mention here that we are in the WebRTC+Kubernetes consultancy business so if you want to throw some $$$ to your demo feel free to contact us.
Thanks,
emailing you now.
Sent an email. Do you have time to work on the project?
Hi @xtianus79,
Sorry for not replying to your email, yet. Recently we changed the info email address and @rg0now pasted the old one. However, we got your email and discussed it and soon we will reply with possible time slots to speak with you.
This is dizzying. I am most familiar with Kubernetes so that is how I came across Stunner. For me, it needs to be able to be managed and scale to be useful. Hence, K8's
I have an IoT device that has MQTT as a protocol. It has video and GStreamer as the video stream provider.
Currently, I have a setup where the device would request a "session" and call an API to setup an RTMP stream from a service. The ingest URL is sent back to the device so that it can begin streaming to the event. When that is finished I have another call the device can make to locate the encoded asset via the event and provide a traditional LL-HLS stream endpoints to an end user.
This works but is certainly not going to be very cost effective or even scalable for that matter.
I want to use something that is more WebRTC based but I feel overwhelmed by the options at this moment. I have been looking into MediaSoup, Janus and others for creating a middleware service that can do a similar feat that I have achieved with RTMP.
However, I cannot understand how the recipe is supposed to go together. The streaming "work" is done on the device so I need it to passthrough to a client. Looking into it I see that GStreamer is somewhat slightly analogous to OBS but with more capabilities such as producing an RTP stream. I practice with my webcam so that is why I am thinking OBS is slightly similar as it will ingest my RTMP stream URL to stream my webcam. I am like 2 weeks old into this subject so please bear with me.
At the moment, OBS doesn't work with WebRTC and Milicast was bought by Dolby so that is a CPAAs option so it's hard to see how these things work all together. Which is difficult for learning. I wish there was more with OBS because it would relate so adequately to these use cases of having the webcam being one part and the client being the other part.
So my questions are this.
I am completely sorry if my questions are noob 9000 status but I would like to learn more and know what track I should be on and if perhaps this is the right one.