libp2p / rust-libp2p

The Rust Implementation of the libp2p networking stack.
https://libp2p.io
MIT License
4.31k stars 896 forks source link

"netlink socket stream shut down" infinite loop #4287

Open nazar-pc opened 11 months ago

nazar-pc commented 11 months ago

Summary

I have not yet identify how exactly, but when doing early shutdown of an application I see this printed forever with Ctrl+C not working, so looks like in the main thread somehow:

``` 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: reading incoming messages 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::codecs: NetlinkCodec: decoding next message 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker netlink_proto::framed: failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: netlink socket stream shut down 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwarding unsolicited messages to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwaring responses to previous requests to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: handling requests 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: sending messages 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker libp2p_mdns::behaviour: if watch returned an error: rtnetlink socket closed 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: reading incoming messages 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::codecs: NetlinkCodec: decoding next message 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker netlink_proto::framed: failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: netlink socket stream shut down 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwarding unsolicited messages to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwaring responses to previous requests to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: handling requests 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: sending messages 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker libp2p_mdns::behaviour: if watch returned an error: rtnetlink socket closed 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: reading incoming messages 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::codecs: NetlinkCodec: decoding next message 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker netlink_proto::framed: failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: netlink socket stream shut down 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwarding unsolicited messages to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwaring responses to previous requests to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: handling requests 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: sending messages 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker libp2p_mdns::behaviour: if watch returned an error: rtnetlink socket closed 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: reading incoming messages 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::codecs: NetlinkCodec: decoding next message 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker netlink_proto::framed: failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: netlink socket stream shut down 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwarding unsolicited messages to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwaring responses to previous requests to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: handling requests 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: sending messages 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: error flushing netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker libp2p_mdns::behaviour: if watch returned an error: rtnetlink socket closed 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: reading incoming messages 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::codecs: NetlinkCodec: decoding next message 2023-08-02 21:07:17.697 ERROR tokio-runtime-worker netlink_proto::framed: failed to read from netlink socket: Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2023-08-02 21:07:17.697 WARN tokio-runtime-worker netlink_proto::connection: netlink socket stream shut down 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwarding unsolicited messages to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: forwaring responses to previous requests to the connection handle 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: handling requests 2023-08-02 21:07:17.697 DEBUG tokio-runtime-worker netlink_proto::connection: sending messages ```

I think the solution here might be to be prepared for runtime shutdown at any time.

Version

0.52.1, but I observed this on previous versions too

Would you like to work on fixing this bug?

Maybe

thomaseizinger commented 11 months ago

Related: https://github.com/libp2p/rust-libp2p/issues/3390.

thomaseizinger commented 9 months ago

Related: https://github.com/libp2p/rust-libp2p/issues/2591.

nazar-pc commented 9 months ago

This happens when during Substrate node initialization I press Ctrl+C, so it has to shut down before it even started properly, in this case above errors are printed until process is killed extrenally.

thomaseizinger commented 9 months ago

I am almost certain that this would be fixed by https://github.com/libp2p/rust-libp2p/issues/2591 because that would move the IO off the main task which is typically the one running in main.rs.

dkuhnert commented 2 weeks ago

I just ran into the same issue. I think i tracked it down to the mdns polling if_watch in a loop. When rtnetlink socket is closed if_watch stream will return an error every time poll is called. So either the if_watch stream can be changed to return None (and thus end the stream) if the socket is closed or add an break statement to be mdns behaviour when if_watch returns an error.

nazar-pc commented 2 weeks ago

Actually I don't see it in the latest version of libp2p anymore. Are you on older version by any chance?

dkuhnert commented 2 weeks ago

I'm on latest:

% cargo tree -i libp2p
libp2p v0.53.2
...

This happend to me in an unittest which runs very quickly. When i run this test on linux it blocks forever but runs fine on macOS.

The if_watch poll never returns Poll::Pending so the while loop in mdns can never complete when the netlink socket is closed.