Open lwbaptx opened 7 years ago
HIi, @winlinvip @lwbaptx I would like to continue a discussion on this issue because it is related to whether the SRS player has proactive disconnection behavior. Therefore, I think it is appropriate to discuss it here.
Recently, I discovered a connectivity attack targeting the SRS source server.
We know that the SRS edge does not actively disconnect from the player. Even if a stream does not exist, the connection between the player and the edge server will continue to exist as long as the player does not have a timeout disconnection mechanism (which is the case for many players). However, the edge server's ingester for this stream, which cannot receive data from upstream, will constantly disconnect and reconnect in an attempt to pull data. The timeout period is 3 seconds, and it is in an infinite loop.
The above is the background information. Now let's discuss the specifics of the attack. For illustrative purposes, let's simplify the network. Suppose there is an origin server, with 4 edge servers connected below it. The attacker will use an RTMP client that does not actively disconnect and simultaneously pull a large number of non-existent streams from the edge servers. For example, let's assume there are 1000 non-existent streams, and each of the 4 edge servers has connections to these 1000 streams. This creates the first amplification behavior. As mentioned earlier, since the edge server cannot receive data from upstream, it will continuously disconnect and reconnect without restrictions, resulting in the second amplification behavior. With 4000 connections continuously connecting and disconnecting in a short period, the Recv-Q on the origin server's listening port quickly becomes full and the service becomes unavailable.
After discovering this issue, I removed the timeout handling in the srs_rmp_dump function in the SRS rtmplib and performed a stress test to replicate this problem easily. I also attempted to remove the timeout retry when the edge ingester pulls streams (although this is only for testing and should not be changed in practice), and the service worked perfectly fine. It is evident that the impact on the origin server's connectivity is significant when the edge ingester reconnects in large numbers.
Of course, we can avoid this issue in business operations, such as requiring authentication for all streams. However, this is not the fundamental solution. Fundamentally, we should consider how to elegantly terminate the edge ingester. Perhaps adding a timeout disconnection mechanism for the playback client in SRS could solve this problem.
TRANS_BY_GPT3
Make sense.
Our previous logic was that if the pushing stream was interrupted, the pulling stream session would also be interrupted.
TRANS_BY_GPT3
srs2.0 release Reproduction steps: 1: Client A is pushing a stream from the origin, and Client B is watching Client A's stream from the edge. 2: Client A stops publishing the stream, but the edge does not know and continues to pull the stream. 3: If Client B's computer crashes, freezes, or the network disconnects, the connection between the edge and Client B will not be disconnected.
Upon analysis, the possible reasons are as follows: In the file "trunk/src/app/srs_app_rtmp_conn.cpp," within the do_playing playback loop, there is a wait code:
consumer->wait This thread needs to be awakened at this point, and there are two conditions for awakening: 1: Waiting for data from the origin stream to awaken, but it never arrives. 2: Waiting for data from Client B, but it never arrives. This ultimately causes the thread to be suspended.
curl -v -X DELETE http://192.168.1.170:1985/api/v1/clients/426 && echo ""
Even using the kickoff API cannot force Client B offline to verify this reason. The connection will remain open and not disconnect unless Client A starts broadcasting again to awaken this playback thread.TRANS_BY_GPT3