ossrs / srs

SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
https://ossrs.io
MIT License
25.32k stars 5.33k forks source link

API: When FFmpeg is blocked, the HTTP callback also not work. #3575

Closed linkewei0580 closed 1 year ago

linkewei0580 commented 1 year ago

Note: Please read FAQ before file an issue, see #2716

Description

By using ingest to obtain RTSP streaming and http_hooks callback, when the camera's network cable is plugged and unplugged, it cannot callback normally, the streaming fails. By changing the http_hooks enabled to off and reloading, the streaming can be restored normally.

  1. SRS Version: XCORE-SRS/5.0.152(Bee)

  2. SRS Log:

[2023-06-08 11:32:14.891][ERROR][27701][qh1k4460][11] serve error code=4005(HttpStatus)(Invalid HTTP status code) : service cycle : rtmp: stream service : rtmp: callback on publish : rtmp on_publish http://192.168.1.121:8082/vod/live-room-webrtc/push-stream-start : http: on_publish failed, client_id=qh1k4460, url=http://192.168.1.121:8082/vod/live-room-webrtc/push-stream-start, request={"server_id":"vid-10545y5","service_id":"0j3592l1","action":"on_publish","client_id":"qh1k4460","ip":"192.168.1.123","vhost":"__defaultVhost__","app":"live","tcUrl":"rtmp://192.168.1.123:1935/live","stream":"rtsp-1","param":"","stream_url":"/live/rtsp-1","stream_id":"vid-9g30x7b"}, response={"requestId":"1ca7308c37704a839fa9847317f1d99c","status":999,"message":"Incorrect result size: expected 1, actual 3","path":"/vod/live-room-webrtc/push-stream-start","data":null,"timestamp":"2023-06-08 11:32:14"}, code=500 : http: status 500

ffmpeg log

Input #0, rtsp, from 'rtsp://admin:1370176lkW@192.168.1.109:554/cam/realmonitor?channel=1&subtype=0':
  Metadata:
    title           : Media Server
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Video: h264 (Main), yuvj420p(pc, bt709, progressive), 1920x1080, 90k tbr, 90k tbn
Output #0, flv, to 'rtmp://192.168.1.123:1935/live/rtsp-1':
  Metadata:
    title           : Media Server
    encoder         : Lavf59.27.100
  Stream #0:0: Video: h264 (Main) ([7][0][0][0] / 0x0007), yuvj420p(pc, bt709, progressive), 1920x1080, q=2-31, 90k tbr, 1k tbn
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[flv @ 0x56202617ea00] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
av_interleaved_write_frame(): Broken pipe
[flv @ 0x56202617ea00] Failed to update header with correct duration.65360.0kbits/s speed=N/A    
[flv @ 0x56202617ea00] Failed to update header with correct filesize.
Error writing trailer of rtmp://192.168.1.123:1935/live/rtsp-1: Broken pipe
frame=    1 fps=0.0 q=-1.0 Lsize=      45kB time=00:00:00.00 bitrate=365520.0kbits/s speed=10.9x    
video:44kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.676465%
Error closing file rtmp://192.168.1.123:1935/live/rtsp-1: Broken pipe
Conversion failed!
  1. SRS Config:

     ingest rtsp-1 {
        enabled      on;
        input {
            type    stream;
            url     rtsp://admin:1370176lkW@192.168.1.109:554/cam/realmonitor?channel=1&subtype=0;
        }
        ffmpeg      /usr/bin/ffmpeg;
        engine {
            enabled          on;
            perfile {
                rtsp_transport tcp;
                timeout 30000;
            }
            vcodec copy;
            acodec copy;
            output          rtmp://192.168.1.123:1935/live/rtsp-1;
        }
    }
    
     http_hooks {
        # default off.
        enabled   on;
        #on_publish      http://192.168.1.121:7999/publish.json;
        #on_unpublish    http://192.168.1.121:7999/unpublish.json;
        on_publish      http://192.168.1.121:8082/vod/live-room-webrtc/push-stream-start;
        on_unpublish    http://192.168.1.121:8082/vod/live-room-webrtc/push-stream-end;
        on_play         http://192.168.1.121:7999/play.json;
        on_stop         http://192.168.1.121:7999/stop.json;
        on_dvr          http://192.168.1.121:7999/recorddone;
    
     }

Replay

Please describe how to replay the bug?

  1. Obtain RTSP streaming through ingest, http_hooks callback, and normal streaming.
  2. When the camera's network cable is plugged and unplugged, it cannot callback normally, and the streaming fails.
  3. Change the http_hooks enabled to off, then reload, and the streaming can be restored normally.

Expect

When the camera's network cable is plugged and unplugged, it can callback normally and the streaming is successful.

winlinvip commented 1 year ago

FFmpeg processes an RTSP stream from a camera. However, when a user disconnects the camera's network cable, FFmpeg may become unresponsive or freeze, potentially causing the HTTP callback to malfunction.

However, it appears that this should not have happened, as FFmpeg is an isolated process that should not obstruct or influence the HTTP callback of SRS.

It is necessary to conduct some research and replicate this issue.

winlinvip commented 1 year ago

FFmpeg pulls the RTSP stream from the camera and then pushes it to SRS. However, when the user disconnects the camera's network cable, FFmpeg may become unresponsive, which may cause the HTTP callback to fail.

However, this should not happen because FFmpeg is an independent process and should not affect SRS's HTTP callback.

It will take time to reproduce this issue.

pythys commented 1 year ago

I confirm witnessing this bug as well, here are my logs

[2023-07-08 13:37:10.024][Warn][212][6692248x][104] client disconnect peer. ret=1008
[2023-07-08 13:37:10.024][Trace][212][65891n6r] TCP: clear zombies=1 resources, conns=4, removing=0, unsubs=0
[2023-07-08 13:37:10.024][Trace][212][6692248x] TCP: disposing #0 resource(RtmpConn)(0x555e1df0ad80), conns=4, disposing=1, zombies=0
[2023-07-08 13:37:11.081][Trace][212][66y15z81] TCP: before dispose resource(RtmpConn)(0x555e1decde50), conns=3, zombies=0, ign=0, inz=0, ind=0
[2023-07-08 13:37:11.081][Error][212][66y15z81][62] serve error code=1011 : service cycle : rtmp: stream service : rtmp: callback on publish : rtmp on_publish http://127.0.0.1:8085/api/v1/streams/publish : http: on_publish failed, client_id=66y15z81, url=http://127.0.0.1:8085/api/v1/streams/publish, request={"server_id":"vid-gk40023","action":"on_publish","client_id":"66y15z81","ip":"188.70.45.172","vhost":"__defaultVhost__","app":"live","tcUrl":"rtmp://live.eyon.tv/live","stream":"oyo1QizEf-gGc_Yj4IYz","param":""}, response=, code=0 : http: client post : http: parse response : parse message : grow buffer : read bytes : timeout 30000 ms
thread [212][66y15z81]: do_cycle() [src/app/srs_app_rtmp_conn.cpp:217][errno=62]
thread [212][66y15z81]: service_cycle() [src/app/srs_app_rtmp_conn.cpp:414][errno=62]
thread [212][66y15z81]: publishing() [src/app/srs_app_rtmp_conn.cpp:830][errno=62]
thread [212][66y15z81]: http_hooks_on_publish() [src/app/srs_app_rtmp_conn.cpp:1338][errno=62]
thread [212][66y15z81]: on_publish() [src/app/srs_app_http_hooks.cpp:147][errno=62]
thread [212][66y15z81]: do_post() [src/app/srs_app_http_hooks.cpp:505][errno=62]
thread [212][66y15z81]: post() [src/protocol/srs_service_http_client.cpp:349][errno=62]
thread [212][66y15z81]: parse_message() [src/protocol/srs_service_http_conn.cpp:100][errno=62]
thread [212][66y15z81]: parse_message_imp() [src/protocol/srs_service_http_conn.cpp:163][errno=62]
thread [212][66y15z81]: grow() [src/protocol/srs_protocol_stream.cpp:162][errno=62]
thread [212][66y15z81]: read() [src/protocol/srs_service_st.cpp:507][errno=62](Timer expired)

going to investigate on my end, but it seems to be a race condition or something on the hooks

pythys commented 1 year ago

OK upon further investigation, the on_publish on my end has an authentication logic that calls a server to authenticate, and if authentication fails then I reject the stream.

The problem, it turns out, is that some streaming software continually retries, and that continuous retry is causing a memory leak and an eventual blowup. So we can solve the problem at the firewall level, but it would be great to identify and solve the problem at the server level of SRS