KERNEL: Use KCP for realtime UDP streaming

ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.

MIT License

24.76k stars 5.28k forks source link

About 600-1000 millisecond streaming media applications

According to Agora's PR, currently CDN latency is generally around 5-20 seconds. If the latency is lower (such as reaching 600 to 1000 milliseconds), there will be many new businesses. However, in reality, it seems that many customers are not so concerned about latency, so I think it's hard to say. I think it might just be a viewpoint that we as technical people came up with: If the end-to-end CDN latency is reduced to around 600-1000 milliseconds, there will be many new businesses. I think we should add a word at the end: right? So the sentence should be:

If the end-to-end CDN latency is reduced to 600-1000 milliseconds, will there be many new businesses?

Note: The original Agora article states that "for example, live streaming or video conferencing, these scenarios are precisely constrained by technology and have not yet exploded.

Currently, can streaming media, such as HTTP-FLV or HTTP-TS, achieve this level? The answer is definitely no. The best indicator currently is ChinaNet's stable 1-3 seconds. If we switch to UDP, it can be reduced by one level to 600-1000 milliseconds. Agora's indicator is 40 milliseconds to 600 milliseconds. How much delay is enough to drive the emergence of new businesses? I think as long as it is reduced by one level, which is 600-1000 milliseconds, so I set the latency target of Oryx at 600-1000 milliseconds, of course, it needs to be coordinated with the end.

Note: The original article from Agora states that "content distribution is done through CDN, ...... and the delay is generally 2 seconds to several tens of seconds.

Note: The original article from Agora, as well as the SD-RTN documentation, states that the metric is "global end-to-end latency, with an average of 76ms.

Note: The original article from Agora, as shown in the image below, states that the CDN latency is "5-20 seconds" and Agora latency is "40-600 milliseconds".

Oryx currently has three conditions for starting:

Completely different positioning from SRS: low latency of 600-1000 milliseconds. SRS is used in applications that do not require high latency, such as general delays of around 3-10 seconds, which are acceptable for most applications.
Reasonable architecture: Oryx uses a loosely coupled multi-process structure, which is different from SRS and allows for more flexible protocol handling and transmission methods.
Finding a suitable UDP transmission framework: After discovering KCP, this problem is also solved. I feel that KCP or GO-KCP would be the appropriate choices.

As for the client side, we can start by supporting Android streaming and playback, with Oryx used for transmission on the server side. SRS and current CDNs have a delay of 3-10 seconds. ORYX aims to provide a solution with a delay of 600-1000 milliseconds. RCDN (Realtime CDN) aims for even lower latency. If there are new applications that achieve the target latency of ORYX, CDNs can certainly optimize it further. Of course, CDNs cannot use TCP, but UDP can be used. Why can't UDP be used? It is completely possible.

I think it's not a technical issue, but rather a lack of demand from clients (it's possible to argue that this is a chicken and egg situation). Clients' applications cannot surpass CDNs, and deploying their own nodes is not feasible for most clients. CDNs rarely consider achieving lower latency because the current TCP solutions are already challenging enough. For example, Wangsu has been satisfied with their 1-3 second delay for many years. Wangsu's mindset is probably: "My clients haven't requested lower latency, and the stable 1-3 second delay is already sufficient. Why should I switch the entire network to UDP?" Even if someone has this idea, it's not practical and difficult to achieve results. KPI assessments don't include such considerations. Other CDNs would likely say, "If Wangsu isn't doing it, it must not be useful." Therefore, I conclude that current CDNs cannot achieve latency within 1000 milliseconds unless lower latency services have already been proven.

VOIP is said to be able to perceive latency at 400 milliseconds, but I think we shouldn't take such a big leap. 600 milliseconds is already very good. Currently, CDNs haven't reached this level of providing basic services, so there's no need for the internet to suddenly drop to 300 milliseconds. It would scare a lot of people. Additionally, if we can achieve 600 to 1000 milliseconds, we can further reduce it because the nature of the service has already changed. Open source is sufficient for testing and validating the market, but it cannot directly provide services.

Note: The original article mentioned that "the telecommunications standard is 400ms. If the latency exceeds 400ms, there will be a noticeable discomfort in the conversation, making it unsuitable for communication.

Oryx can try to see if the 600-1000ms low latency is just a fantasy or if there is indeed a widespread hardware demand. Currently, it's a chicken-and-egg situation, with both hesitating.

TRANS_BY_GPT3

This post was written 4 years ago when I was mainly doing live streaming and had no knowledge of low-latency technologies like WebRTC or RTC. It expresses my thoughts on live streaming within 1 second. Well, of course, the writing style back then was quite youthful, and now I have been trained to not use exclamation marks or unnecessary emotional words.

After working on RTC servers at Alibaba Cloud for four years, I can say that I have gained some insights. Looking back at this article, if live streaming latency is reduced to within 1 second, can it disrupt the entire live streaming industry? Will it give rise to many new scenarios?

Currently, my answer remains: 90%, no.

So, is the remaining 10% certain? No, it's not. Indeed, there are new scenarios at the moment, but it's definitely not the scenario we thought it was.

Some new scenarios are actually communication scenarios that were previously done through phone calls or dedicated devices, but now they are done over the internet. For example, VoIP and WebRTC are doing this, and the so-called 400ms is actually a standard in the communication field. In fact, this is a process of internetizing communication, and there will certainly be many new scenarios in this regard. For instance, ZOOM is actually a typical internet video conferencing platform, and there are already many vendors in this race.
Returning to live streaming, low-latency live streaming within one second is now mainly provided by major cloud providers. For example, Alibaba Cloud RTS, Tencent Cloud's Fast Live, and Agora have also entered the live streaming market from the RTC field. From a transmission technology perspective, there isn't much difference in using WebRTC for live streaming, as the internet can also transmit UDP, not just TCP protocol. This part of the scenario mainly focuses on interactive live streaming, such as live streaming competitions, online arguments between hosts, social live streaming, interactive live streaming for e-commerce, and educational interactive live streaming.
There are some new scenarios such as VR/AR, the recently popular metaverse, 5G, remote control, cloud gaming, cloud desktop, and so on, which can all be implemented using audio and video technology. Or conversely, it can be said that those in the audio and video industry hold the hammer of RTC and can apply it to various industries.

The new scenarios mentioned earlier about live streaming are actually just the second type, and this type of scenario actually has a small volume. From the perspective of application scenarios or usage scenarios:

Live streaming >> Meetings >> Interactive live streaming

Comparing the market of live streaming, the market of meetings (excluding traditional meeting hardware), and the market of interactive live streaming, we can clearly see the relationship. This relationship can be observed from the users of open-source projects, the size of the community, and the revenue of cloud vendors.

Therefore, from the perspective of strict 1-second live streaming scenarios, it is really impossible to revolutionize because consumers really don't care about the underlying technology. In terms of experience, ordinary live streaming does not have a demand for interactivity. It is like forcing all live streaming to be replaced by interactive live streaming, just like what is currently being done by forcing all 4G users to upgrade to 5G. I can't understand why I should eat this crab, it doesn't benefit me at all, except that I will have to spend more money in the future, and even the current 5G experience is not up to the level of 4G.

Returning to the technology, can KCP achieve live streaming within 1 second, or can SRT achieve it? In fact, this question is quite simple. If we only consider the transmission delay of the server, such as the vague concept of global 76ms RTT, any UDP protocol can be used. The ceiling for determining the transmission delay is the fiber RTT. However, if we are talking about the end-to-end delay of the entire link, which is the delay perceived by the user, we cannot ignore the client.

So my main issue before was a lack of understanding of the client, as I was only considering the problem from the perspective of the server.

Narrow-minded, so narrow-minded.

TRANS_BY_GPT3

ossrs / srs

KERNEL: Use KCP for realtime UDP streaming #770