Closed wr-ning closed 2 years ago
(Business Scenario Supplement:
Please make sure to maintain the markdown structure.
TRANS_BY_GPT3
This seems to be a retry strategy problem, belonging to the boundary conditions that need optimization. You can reduce the retry interval a bit.
However, no matter what, when edge sourcing, it is necessary to consider destroying the flow of sourcing to avoid a large number of flows sourcing. Therefore, there will always be a time window where playback is prohibited, which cannot be avoided regardless of optimization efforts.
A better way to bypass this issue is to have the client support multiple edges and retry several times in case of failure. This approach can effectively solve the problem.
Of course, the retry interval for the edges is worth optimizing.
TRANS_BY_GPT3
First of all, thank you for patiently answering. Seeing this issue tagged with "5.0 milestone" instantly made me feel like it's still far away.
Today, I tried following your suggestions and attempted to use flv.js. Since it only allows listening through events and executing reconnection, I had to try it out. During testing, there weren't any issues for now. I will observe the results once it's deployed online.
TRANS_BY_GPT3
I have also encountered this. It seems to be a 3-second delay implemented for security purposes. You can modify the code and it would be best to make the 3-second delay configurable.
https://github.com/ossrs/srs/issues/2215#issuecomment-901682601
By the way, this issue is becoming a frequently asked question.
TRANS_BY_GPT3
Thank you. I have read #2215 and also #2707. Indeed, from the discussion, it seems that this issue arises every few months. The process of modifying the code, recompiling, and repackaging is a bit costly for us, who are working on other languages.
For now, we can only temporarily solve this problem by following the "listen and reconnect" approach mentioned above. We will try other solutions when we have a better one. https://github.com/ossrs/srs/issues/2901#issuecomment-1031491071
TRANS_BY_GPT3
Um, client reconnection is definitely necessary. Don't rely solely on server improvements. Stability is a system engineering concern, and of course, the client needs to be considered.
In general, there is always a way to bypass common problems. Solutions are always more abundant than problems.
TRANS_BY_GPT3
I added an event that allows for notifications and waiting. Additionally, I am considering adding a configuration file to enable or disable this mode. We have already updated our own server using this method. So far, there haven't been any issues. @winlinvip
srs_error_t SrsPlayEdge::on_client_play()
{
srs_error_t err = srs_success;
// start ingest when init state.
if (state == SrsEdgeStateInit) {
state = SrsEdgeStatePlay;
err = ingester->start();
} else if (state == SrsEdgeStateIngestStopping) {
srs_cond_wait(ingester_wait);
srs_trace("on_client_play srs_cond_wait.");
if(state == SrsEdgeStateInit){
state = SrsEdgeStatePlay;
err = ingester->start();
}else if(state == SrsEdgeStateIngestStopping){
return srs_error_new(ERROR_RTMP_EDGE_PLAY_STATE, "state is stopping 1.");
}else{
return srs_error_new(ERROR_RTMP_EDGE_PLAY_STATE, "state is stopping 2.");
}
}
return err;
}
void SrsPlayEdge::on_all_client_stop()
{
// when all client disconnected,
// and edge is ingesting origin stream, abort it.
if (state == SrsEdgeStatePlay || state == SrsEdgeStateIngestConnected) {
SrsEdgeState pstate = state;
state = SrsEdgeStateIngestStopping;
srs_trace("on_all_client_stop begin. ");
ingester->stop();
state = SrsEdgeStateInit;
srs_trace("edge change from %d to %d then %d (init).", pstate, SrsEdgeStateIngestStopping, state);
srs_cond_signal(ingester_wait);
srs_trace("on_all_client_stop srs_cond_signal");
return;
}
}
TRANS_BY_GPT3
Sorry for keeping everyone waiting for so long, I have resolved it in 4.0.267.
The solution of waiting for notification is good (although there is an even simpler solution). š
Hahaha, everyone actually gave quite a few solutions, but the best one is still SRS, so we don't need to use menstrual patches anymore. š
I checked and the retry time is around 3 seconds. The reason is that when the last playback is triggered, it will stop pulling the stream, and this stop coroutine
operation takes a considerable amount of time. Other operations are completed in milliseconds.
[2022-10-10 08:01:45.207] Debug: Start to stop edge coroutine
[2022-10-10 08:01:48.207] Debug: Start to stop edge upstream
[2022-10-10 08:01:48.208] Debug: Start to unpublish source
[2022-10-10 08:01:48.209] Debug: Edge stopped
Looking at the code, it is waiting for a fixed 3 seconds, so the mechanism here is not reasonable.
void SrsPlayEdge::on_all_client_stop() {
if (state == SrsEdgeStatePlay || state == SrsEdgeStateIngestConnected) {
state = SrsEdgeStateIngestStopping;
ingester->stop();
}
void SrsEdgeIngester::stop() {
trd->stop();
upstream->close();
source->on_unpublish();
}
srs_error_t SrsEdgeIngester::cycle() {
while (true) {
do_cycle(); // Interrupted by stop.
srs_usleep(SRS_EDGE_INGESTER_CIMS); // 3s
}
}
A more reasonable approach would be to improve the thread stopping mechanism. The problem is that when the thread is stopped, it interrupts do_cycle()
, but the outer sleep is not interrupted (it sleeps again after the interruption). We just need to solve this problem in a simple way.
After the improvement, the stop operation is completed within 2ms, which is generally not triggered by normal user operations.
[2022-10-10 08:07:07.455] Debug: Start to stop edge coroutine
[2022-10-10 08:07:07.457] Debug: Edge stopped
Although condition variables or delayed release are also solutions to this problem, re-releasing the object (somewhat like restarting) is the simplest solution, and the state of the object is also very clear and simple, so the problem is minimized.
Note: Some friends hope that we can implement delayed release, but in fact, there is no perceptible difference in testing compared to re-pulling the stream. However, delayed release of the stream requires a timer, which can make it more complicated. Moreover, delayed release of the stream will cause the stream to be pulled even when no one is playing it, which can also cause issues in certain scenarios.
TRANS_BY_GPT3
@winlinvip This modification is relatively simple, but if the edge nodes are attacked, it may bring down the origin server. This contradicts the purpose of our original servers, doesn't it?
TRANS_BY_GPT3
Preventing attacks is a separate solution, and this retry interval is not designed for preventing attacks.
TRANS_BY_GPT3
Description
origin.cluster.serverA.conf
origin.cluster.serverB.conf
origin.cluster.serverC.conf
origin.cluster.edge.conf
Expected Behavior (Expect)
> Describe your expectation Please describe what you expect to happen.
Expectation in the source cluster environment I expect that the edge streaming nodes, in the source cluster environment, can successfully pull streams just like the nodes in the previous standalone deployment, regardless of the time of pulling the stream (assuming the stream is being pushed).
TRANS_BY_GPT3