gwuhaolin / livego

live video streaming server in golang
MIT License
9.72k stars 2k forks source link

Short Reconnect / Discontinuity for HLS #206

Open tiger5226 opened 2 years ago

tiger5226 commented 2 years ago

Hi Livego team,

Excellent work with this product! I have been able to use it heavily and we are planning to launch it at scale (hundreds of concurrent livestreamers!).

One issue we currently have with it though is for short reconnects. When there is a short reconnect, the VirWriter hits an error on SendPacket when it does a new TranStart for the reconnect. This causes the HLS connection to be closed/removed. Even if the flag to not remove is true, the HLS Source just remains stagnant and gets no updates. There is no "restart" for the transfers. I am in the process of implementing this (obviously can share it with the community as well if you would like the contribution). However, I wanted to check with the team here, to make sure I understand the problem and if you are aware of this problem first.

In addition to this topic, there is the DISCONTINUITY topic for m3u8 files. Whenever a stream is reconnected, it is standard for their to be an insert between ts segments of #EXT-X-DISCONTINUITY so the player knows how to handle the gap in timestamps. I am planning to add that in once the short reconnect issue is resolved. Was there any direction you would provide on this that maybe you have already looked into? Again I will surely share the changes, and if I can align it with your preferred designs that would be ideal if shared.

Lastly, when a reconnect happens, the seq for the HLS source gets reset. So if there is a reconnect on a 6 hour stream the seq will get reset to 1. This will then confuse players into thinking that it has all the TS files it needs and will not try to get more of them until the time lapses the seq it last saw. This I think is a minor fix once the short reconnect issue is resolved though (VideoJS Example).

tzarebczan commented 2 years ago

We got past the issue with sendPacket, but are still running into issues with the reconnect scenario due to the timestamps in the ts files once a stream is reconnected. We tried adding both gap and discontinuity, and videojs didn't like it.

nginx-RTMP seems to handle the connection close + reconnect by adding a partial entry into the playlist, and then not reset the ts timestamp to 0 on reconnect - so the playback continues nicely.

In the end, this is the scenario we want to achieve: A stream is streaming for 1 hour, and then has a network hiccup. OBS reconnects after 10 seconds (default). The playback should continue smoothly, with the missing data not being available.

tiger5226 commented 2 years ago

Some more information: I traced it back to the ChunkStream. When it reads a chunk and its format type 0, the absolute timestamp should be in the header. It appears when you click stop/start in OBS this value comes in as 0. Whenever OBS first connects, this value is always 0. As a result it resets the chunk stream timestamp to 0. This then feeds to the Packet, which feeds to the HLS Server as DTS, which goes to VHS. VHS sees that the last segment ended at say 19992300, then the next segment from the reconnect shows a DTS of 250 and VHS errors out (silently, can see it in the debug logs) saying the next segment cannot be in the past. This cascades from there. The sad part here, is that livego is following the RTMP specification perfectly. OBS is ultimately resetting the timestamp when it sends the format type 0 chunk with a 0 timestamp. (RTMP Spec)

A possible solution: If there is a hard disconnect the socket is closed, and I can prevent a reconnect for greater than 10 seconds, so all the Readers/Writers are closed before the reconnect. This allows it to act like a new stream, and VHS handles this much better (although we lose all stored TS segments). I have implemented this and works really well. However, when there is a network stopage but the connection is not closed...just not being written to. It still allows the reconnect, and things get borked again. I can't seem to find a way to catch this scenario smoothly so I can kill the stream. OBS will attempt to reconnect 10 seconds later by default which will give livego the time needed close the readers/writers.

It appears that there was an attempt to handle reconnects, but it is a bit faulty. The DTS/PTS problem for HLS is just one. Another is that a subsequent FLV Writer will get created, and you will have 2 FLV files being written to at the same time. This can be mitigated with some post processing to throw away the extra FLV after the stream ends. If we are to handle the reconnect on the livego side, like Nginx-RTMP-Module does, we need to keep track of the DTS and add an offset to any format type 0 chunk that comes in after a reconnect, within a certain time period. Also the VirReader when it looses connection is closing all its writers. This should be avoided to assist with a reconnect. The CheckAlive will automatically remove the HLS Source when there is no data incoming for a period of time.

paxelpixel commented 1 month ago

where you able to find a solution for this? in running into the sane problem