ossrs / srs

SRS is a simple, high-efficiency, real-time video server supporting RTMP, WebRTC, HLS, HTTP-FLV, SRT, MPEG-DASH, and GB28181.
https://ossrs.io
MIT License
24.74k stars 5.28k forks source link

GB28181: When camera restart, can not connect to SRS. #3944

Open daveyang05 opened 5 months ago

daveyang05 commented 5 months ago

Integrating Hikvision cameras using the GB28181 protocol, after the camera restarts, it takes more than two hours for the video stream to recover. Before recovery, the session status in SRS remains in the 'established' state. Approximately two hours later, the camera sends a remote 'reset' command, after which SRS disconnects the media stream, and then normal operation resumes.

yushimeng commented 5 months ago

"'reset' command" means tcp reset pkt over sip?

daveyang05 commented 5 months ago

Could you please clarify if GB28181 is capable of actively detecting when a media stream is disconnected and subsequently transitioning the state of the camera session to an initial state, among other potential state changes?

TRANS_BY_GPT4

daveyang05 commented 5 months ago

2024-01-30 10:23:44.903][ERROR][1][47l067d9][104] SIP: Receive err code=1007(SocketRead)(Socket read data failed) : parse message : parse message : grow buffer : read bytes : read thread [1][47l067d9]: do_cycle() [./src/app/srs_app_gb28181.cpp:1077][errno=104] thread [1][47l067d9]: parse_message() [./src/protocol/srs_protocol_http_conn.cpp:103][errno=104] thread [1][47l067d9]: parse_message_imp() [./src/protocol/srs_protocol_http_conn.cpp:153][errno=104] thread [1][47l067d9]: grow() [./src/protocol/srs_protocol_stream.cpp:162][errno=104] thread [1][47l067d9]: read() [./src/protocol/srs_protocol_st.cpp:566][errno=104](Connection reset by peer)

Hello! The above text is a log printout from SRS. According to the log, SRS disconnects the media stream connection after receiving the "Connection reset by peer" command. Subsequently, the status transitions from established to init, at which point it can accept registration messages from the camera. Preliminary analysis suggests this is the case.

The initial suspicion is that the support for SRS to access the 28181 protocol did not detect the media stream fault.

TRANS_BY_GPT4

yushimeng commented 5 months ago

It looks like sip recv thread exit did not notify sip conn thread.

winlinvip commented 5 months ago

@yushimeng This seems quite reasonable and commendable. If a thread exits abnormally, there might indeed be an issue with how this logic is handled.

TRANS_BY_GPT4

yushimeng commented 5 months ago

@daveyang05 try this pull https://github.com/ossrs/srs/pull/3947 give me a feedback if dont work

winlinvip commented 5 months ago

@yushimeng Nice work!

daveyang05 commented 4 months ago

Hello! The developer responsible for interfacing with the development of 28181 has taken leave due to personal matters at home. They will commence the verification process in this area immediately upon their return after the New Year, and the results will be promptly communicated to you. Additionally, could you please confirm if it is branch #3947?

TRANS_BY_GPT4

yushimeng commented 4 months ago

When the media connection is disconnected, the session will be directly destroyed, but when the SIP connection is disconnected, the session will not be immediately destroyed. If we follow the current approach, when a new SIP connection is connected, sip/session status recovery and authentication issues need to be handled specially. My idea is to also directly destroy the session when the SIP connection is disconnected Although I have processed the session recovery logic when SIP immediately reconnects in my current submission, I can also consider deleting this recovery logic in the future Additionally, I have added SrsResourceManager: erase to avoid bind session before session resource destruction. I am not sure if it will disrupt original desion of the Lazy sweep and resource manager.

daveyang05 commented 4 months ago

Hello! The code modification response is very fast. Is there an anomaly in the SRS resource management, and are there relevant test cases in the original design? Evaluate the impact on the original system design by regressing these test cases.

TRANS_BY_GPT4

daveyang05 commented 4 months ago

@yushimeng, I would like to inquire: Have you not yet officially committed your code modifications to the development branch of SRS?

TRANS_BY_GPT4

yushimeng commented 4 months ago

The previous submission was based on my incorrect understanding of the code. Could you provide the logs and configuration so that I can further pinpoint the issue with greater accuracy?

TRANS_BY_GPT4

daveyang05 commented 4 months ago

Okay, attached is the log information from that time: (Note: The file upload for "Camera Restart Recovery Time Exceeding 2 Hours Log.zip" appears to be incomplete or pending.)

TRANS_BY_GPT4

daveyang05 commented 4 months ago

Okay, the attachment contains the log print information from that time.

TRANS_BY_GPT4

daveyang05 commented 4 months ago

Configuration of the camera's IP settings, GB28181 integration, TCP protocol. (Attachment: Configuration information for GB28181 camera is being uploaded...)

TRANS_BY_GPT4

daveyang05 commented 4 months ago

@yushimeng, may I inquire about the progress of the issue resolution?

TRANS_BY_GPT4

daveyang05 commented 3 months ago

For SIP terminal registration messages, the CSeq field can be used to determine whether the message is an initial registration or a subsequent periodic registration. This field increments with each report from the terminal. For initial registration messages, the previous session data should be initialized and the process should start anew. If it is a periodic registration message, the current session information should be retained. To differentiate between initial and periodic registration messages, the SRS should keep track of the last reported SIP message's CSeq value, which normally increases continuously. If a decrease is observed (and the last message did not reach or approach 0xffffffff), it can be inferred that the message is an initial registration.

TRANS_BY_GPT4

winlinvip commented 3 months ago

@daveyang05 Nice work, welcome to file a patch to fix this issue. :)

yushimeng commented 3 months ago

10.2 Constructing the REGISTER Request Call-ID: All registrations from a UAC SHOULD use the same Call-ID header field value for registrations sent to a particular registrar.

       If the same client were to use different Call-ID values, a
       registrar could not detect whether a delayed REGISTER request
       might have arrived out of order.

  CSeq: The CSeq value guarantees proper ordering of REGISTER
       requests.  A UA MUST increment the CSeq value by one for each
       REGISTER request with the same Call-ID.
daveyang05 commented 3 months ago

SIP registration messages are typically sent at minute-level intervals. Observations from Hikvision cameras indicate that they initiate registration messages at least every 10 minutes, making the likelihood of message disorder occurring within a few minutes quite low. To determine whether a message from a camera is the initial registration or a subsequent message, one can check if the CSeq number has been reversed and if the Call-ID is the same as the previous registration message. Within the same session, the initial registration message and subsequent session messages should have the same Call-ID, as confirmed by the requirements of the SIP protocol and packet captures from Hikvision cameras.

TRANS_BY_GPT4

daveyang05 commented 3 months ago

@Yu Gong, you can enhance your original modifications by adding appropriate logic to compare incoming SIP registration messages with previously stored session registration information. If there is a change in the Call-ID or if the CSeq number is lower than before, then clear the existing session and initiate a new SIP session. Otherwise, maintain the existing session.

TRANS_BY_GPT4

daveyang05 commented 3 months ago

General Yang and Engineer Yu, after modifying the GB28181 code, we have tested and verified that the Hikvision cameras can quickly recover the video stream after a restart. The code changes have been made in the version 5.0 branch. srs_app_gb28181.zip

TRANS_BY_GPT4

codeex commented 3 months ago

@daveyang05 can you publish a docker image for patch it? I don't find the release package to fix it.

daveyang05 commented 3 months ago

@Yu Gong, we are currently engaged in development and validation for version 5.0. The attached Docker container has been modified and released based on that version branch.

TRANS_BY_GPT4

daveyang05 commented 3 months ago

As mentioned above.

TRANS_BY_GPT4

codeex commented 3 months ago

@daveyang05 , I can't find branch v5.0 to compile it, what can I do to find it for docker or source code?

daveyang05 commented 3 months ago

The text appears to be a link to a downloadable ZIP file named "srs_app_gb28181 camera restart 2 hours recovery code modification.zip" hosted on the GitHub platform under the repository 'ossrs/srs'. The file name suggests that it contains modifications to the code for an application related to the GB28181 protocol, which is a Chinese national standard for video surveillance systems. The modifications might be for a feature that allows a camera to recover or restart after 2 hours.

TRANS_BY_GPT4

daveyang05 commented 3 months ago

Code

TRANS_BY_GPT4

codeex commented 3 months ago

I downloaded the version 5.0 release branch, substituted the altered GB28181 file, and subsequently recompiled to create the image. Despite redeployment, the changes do not seem to be applied. Restarting the Hikvis srs5-disconnect.log ion camera did not resolve the issue, as it still fails to reconnect to the video stream, although the camera status indicates it is online. The cause of the problem is unclear.

TRANS_BY_GPT4

daveyang05 commented 3 months ago

srs_error_t SrsLazyGbSipTcpConn::bind_session(SrsSipMessage* msg, SrsLazyObjectWrapper** psession) { srs_error_t err = srs_success;

string device = msg->device_id();
if (device.empty()) return err;

// Only create session for REGISTER request.
if (msg->type_ != HTTP_REQUEST || msg->method_ != HTTP_REGISTER) return err;

// The lazy-sweep wrapper for this resource.
SrsLazyObjectWrapper<SrsLazyGbSipTcpConn>* wrapper = wrapper_root_;
srs_assert(wrapper); // It MUST never be NULL, because this method is in the cycle of coroutine of receiver.

// Find exists session for register, might be created by another object and still alive.
SrsLazyObjectWrapper<SrsLazyGbSession>* session = dynamic_cast<SrsLazyObjectWrapper<SrsLazyGbSession>*>(_srs_gb_manager->find_by_id(device));

// If a session is found by device ID and the current message is a registration message
**if (session && msg->is_register()) {
    // If the cseq number decreased or the call id changed
    _if (msg->cseq_number_ < register_->cseq_number_ || msg->call_id_ != register_->call_id_) {
        // Remove resource from GB manager
        _srs_gb_manager->remove(session);

        // Set session to NULL
        session = NULL;
    }
}_**

if (!session) {
    // Create new GB session.
    session = new SrsLazyObjectWrapper<SrsLazyGbSession>();

    if ((err = session->resource()->initialize(conf_)) != srs_success) {
        srs_freep(session);
        return srs_error_wrap(err, "initialize");
    }

Please verify if the bind_session function in the downloaded code contains the following code. if (session && msg->is_register()) { // If the cseq number decreased or the call id changed _if (msg->cseqnumber < register_->cseqnumber || msg->callid != register_->callid) { // Remove resource from GB manager _srs_gb_manager->remove(session);

        // Set session to NULL
        session = NULL;
    }
}_

TRANS_BY_GPT4

codeex commented 3 months ago

@daveyang05 yes, I edit this file and the content is below.

if (session && msg->is_register()) {
        srs_trace("SIP: receive register message %s", device.c_str());
        // If the cseq number decreased or the call id changed
        if (msg->cseq_number_ < register_->cseq_number_ || msg->call_id_ != register_->call_id_) {
            // Remove resource from GB manager
            srs_trace("SIP: remove session");
            _srs_gb_manager->remove(session);

            // Set session to NULL
            session = NULL;
        }
    }

but I hasn't found the log.

winlinvip commented 3 months ago

First, thank @daveyang05 @codeex @yushimeng to describe the issue and background, which is very important for future bug fixing.

Please do not discussion code in issue, instead please file a PullRequest and discuss in the pullrequest.

If we discuss code changes in issues, there will be incorrect and temporary code changes that confuses other developers.

So I will freeze this issue for too heated, please file an pull request and discuss there.