TcpStream sockets leaked by websocket::client::async::Framed::close()

YetAnotherMinion commented 6 years ago

When you call close on a Framed stream, it does not actually close or shutdown the underlying socket fd. See (https://github.com/tokio-rs/tokio-io/issues/80). I tested the parallel-server from the examples/ folder by setting the file descriptor limit per process to 1024 and used https://github.com/observing/thor to connect the the server 2000 times and send 1 message with 100 concurrently active connections.

./bin/thor --amount 2000 --concurrent 100 --masked --messages 1 ws://127.0.0.1:8081

The example server will crash in the accept call because it hits the file descriptor limit. View the open file descriptors using ls /proc/$PID/fd of the example parallel-server, and you will see that none of the socket connections were closed (there will be 1024 sockets open even though the client program only has 100 open)

This is the shutdown that gets called by calling close on the Framed. Note that it does not call shutdown on the socket, unlike TcpStream::shutdown

impl<'a> AsyncWrite for &'a TcpStream {
    fn shutdown(&mut self) -> Poll<(), io::Error> {
        Ok(().into())
    }

Which is called by

impl AsyncWrite for TcpStream {
    fn shutdown(&mut self) -> Poll<(), io::Error> {
        <&TcpStream>::shutdown(&mut &*self)
    }

Which comes from tokio-io Framed impl using the AsyncWrite version of shutdown because of its generic implementation.

#1  0x00005555555d6d6c in tokio_io::framed::{{impl}}::shutdown<tokio_core::net::tcp::TcpStream,websocket::codec::ws::MessageCodec<websocket::message::OwnedMessage>> (self=0x7ffff509a0f8)
    at /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-io-0.1.4/src/framed.rs:190
190         self.0.shutdown()
(gdb) l
185     }
186 }
187 
188 impl<T: AsyncWrite, U> AsyncWrite for Fuse<T, U> {
189     fn shutdown(&mut self) -> Poll<(), io::Error> {
190         self.0.shutdown()
191     }
192 }
193 
194 impl<T, U: Decoder> Decoder for Fuse<T, U> {

See https://github.com/tokio-rs/tokio-io/issues/80#issuecomment-352207760

@P-E-Meunier Because "AsyncWrite::shutdown" and TCP shutdown are two completely separate functions with different behaviors.

This is why AsyncWrite::shutdown needs to be renamed.

I solved the TcpStream not being closed by creating a custom split function that returned a custom SplitSink struct that returned mutable access to the contained stream.

/// A `Sink` part of the split pair                                                                 
#[derive(Debug)]                                                                                    
pub struct MySplitSink<S>(BiLock<S>);                                                               

impl<S> MySplitSink<S> {                                                                            
    /// Attempts to put the two "halves" of a split `Stream + Sink` back                            
    /// together. Succeeds only if the `MySplitStream<S>` and `MySplitSink<S>` are                  
    /// a matching pair originating from the same call to `Stream::split`.                          
    pub fn reunite(self, other: MySplitStream<S>) -> Result<S, ReuniteError<S>> {                   
        self.0.reunite(other.0).map_err(|err| {                                                     
            ReuniteError(MySplitSink(err.0), MySplitStream(err.1))                                  
        })                                                                                          
    }                                                                                               

    pub fn inner(&mut self) -> &mut BiLock<S> {                                                     
        &mut self.0                                                                                 
    }                                                                                               

}

Which can then be used with a custom future that calls the shutdown method on the TcpStream instance directly, not through the AsyncWrite trait.

pub struct MyShutdown {                                                                               
    pub stream: MySplitSink<Framed<TcpStream, MessageCodec<OwnedMessage>>>,                         
}                                                                                                   

impl Future for MyShutdown {                                                                          
    type Item = ();                                                                                 
    type Error = WebSocketError;                                                                    

    fn poll(&mut self) -> Poll<Self::Item, Self::Error> {                                           
        match self.stream.inner().poll_lock() {                                                     
            Async::Ready(mut framed) => {                                                           
                let tcp_stream: &mut TcpStream = framed.get_mut();                                                                                                                
                match tcp_stream.shutdown(::std::net::Shutdown::Both) {                           
                    Ok(()) => Ok(Async::Ready(())),                                               
                    Err(error) => Err(WebSocketError::IoError(error))                             
                }                                                                                 
                Ok(Async::Ready(()))                                                                
            },                                                                                      
            Async::NotReady => Ok(Async::NotReady),                                                 
        }                                                                                           
    }                                                                                               
}

I propose that the websocket library include this custom version of SplitSink, split, and the Shutdown future because the generic futures implementation for a stream does not handle cleaning up any resources the generic stream may have. Dropping both the sink and stream of a split stream does not close the socket either.

Errata: I also copied the implementation of SplitStream into MySplitStream so I could make reunite work with the new MySplitSink.

illegalprime commented 6 years ago

Good catch, sounds like a big issue. Where in the example do you imagine the cleanup happening? (I haven't written this example so I'm not that familiar)

YetAnotherMinion commented 6 years ago

Here is a patch for the examples/parallel-server.rs that notices when the socket has been closed by the other side, and calls shutdown on its TcpStream instance to release the underlying file descriptor.

There are other problems with the examples/parallel-server.rs that prevents us from running the thor benchmark I described because the example server is really slow at noticing connections are dead.

./bin/thor --amount 2000 --concurrent 100 --masked --messages 1 ws://127.0.0.1:8081

However to verify that we clean up resources correctly, what you can now do is run the benchmark with a number of connections just below your file descriptor limit.

./bin/thor --amount 1000 --concurrent 100 --masked --messages 1 ws://127.0.0.1:8081

The thor process will exit, then after a while the example server will get around to sending to the dead sockets (example server is really slow) and notice they are dead. Then you will see a bunch of output like

SHUTDOWN
send error: IoError(Error { repr: Os { code: 32, message: "Broken pipe" } })
SHUTDOWN
send error: IoError(Error { repr: Os { code: 32, message: "Broken pipe" } })
SHUTDOWN
send error: IoError(Error { repr: Os { code: 32, message: "Broken pipe" } })
SHUTDOWN
send error: IoError(Error { repr: Os { code: 32, message: "Broken pipe" } })
SHUTDOWN

After that output stops you can rerun the thor benchmark, so that the total number of connections you have made to the running example server is greater than the per process ulimit -n.

./bin/thor --amount 1000 --concurrent 100 --masked --messages 1 ws://127.0.0.1:8081

1000 + 1000 > 1024 for example

The patched example server will not crash. You can rinse and repeat as many times as you like, all the old connections will cleaned up after a few seconds as the server notices they are dead. Fixing the example to notice connections are dead more quickly is another issue I can address in a separate PR.

diff --git a/examples/parallel-server.rs b/../src/bin/scratch.rs
index 43e1e80..1a49749 100644
--- a/examples/parallel-server.rs
+++ b/../src/bin/scratch.rs
@@ -1,4 +1,5 @@
 extern crate websocket;
+#[macro_use]
 extern crate futures;
 extern crate futures_cpupool;
 extern crate tokio_core;
@@ -28,7 +29,7 @@ fn main() {
    let mut core = Core::new().expect("Failed to create Tokio event loop");
    let handle = core.handle();
    let remote = core.remote();
-   let server = Server::bind("localhost:8081", &handle).expect("Failed to create server");
+   let server = Server::bind("127.0.0.1:8081", &handle).expect("Failed to create server");
    let pool = Rc::new(CpuPool::new_num_cpus());
    let connections = Arc::new(RwLock::new(HashMap::new()));
    let (receive_channel_out, receive_channel_in) = mpsc::unbounded();
@@ -53,7 +54,7 @@ fn main() {
                         .borrow_mut()
                         .next()
                         .expect("maximum amount of ids reached");
-                    let (sink, stream) = framed.split();
+                    let (sink, stream) = hack::split::split(framed);
                     let f = channel.send((id, stream));
                     spawn_future(f, "Senk stream to connection pool", &handle_inner);
                     connections_inner.write().unwrap().insert(id, sink);
@@ -92,12 +93,23 @@ fn main() {
            let connections = connections.clone();
            let sink = connections.write()
                                  .unwrap()
-                                 .remove(&id)
-                                 .expect("Tried to send to invalid client id",);
+                                 .remove(&id);
+                                 //.expect("Tried to send to invalid client id",);
+            let sink = match sink {
+                Some(sink) => sink,
+                None => {
+                    println!("{} has gone away, dropping msg", id);
+                    return Ok(());
+                }
+            };

-           println!("Sending message '{}' to id {}", msg, id);
-           let f = sink.send(OwnedMessage::Text(msg))
-                       .and_then(move |sink| {
+           //println!("Sending message '{}' to id {}", msg, id);
+           let f = experiment::ShutdownOnSendError {
+                in_shutdown: false,
+                inner: hack::send::new(sink, OwnedMessage::Text(msg)),
+            };
+
+            let f = f.and_then(move |sink| {
                                      connections.write().unwrap().insert(id, sink);
                                      Ok(())
                                     });
@@ -131,19 +143,19 @@ fn spawn_future<F, I, E>(f: F, desc: &'static str, handle: &Handle)
          E: Debug
 {
    handle.spawn(f.map_err(move |e| println!("Error in {}: '{:?}'", desc, e))
-                 .map(move |_| println!("{}: Finished.", desc)));
+                 .map(move |_| /*println!("{}: Finished.", desc)*/ () ));
 }

 fn process_message(id: u32, msg: &OwnedMessage) {
    if let OwnedMessage::Text(ref txt) = *msg {
-       println!("Received message '{}' from id {}", txt, id);
+       //println!("Received message '{}' from id {}", txt, id);
    }
 }

 type SinkContent = websocket::client::async::Framed<tokio_core::net::TcpStream,
                                                     websocket::async::MessageCodec<OwnedMessage>>;
-type SplitSink = futures::stream::SplitSink<SinkContent>;
+type SplitSink = hack::split::SplitSink<SinkContent>;
 // Represents one tick in the main loop
 fn update(
    connections: Arc<RwLock<HashMap<Id, SplitSink>>>,
@@ -182,3 +194,253 @@ impl Iterator for Counter {
        }
    }
 }
+
+/// Expermiment module is an example of code the end user would need to write to respond to send
+/// errors on the TcpStream. In this example, all send errors are dropped and the socket
+/// immediately closed.
+pub mod experiment {
+    // This custom sink is only needed until futures crate merges PR
+    use super::hack;
+
+    use futures::{Future, Async};
+    use futures::Poll;
+
+    use websocket::client::async::Framed;
+    use websocket::async::TcpStream;
+    use websocket::async::MessageCodec;
+    use websocket::OwnedMessage;
+    use websocket::result::WebSocketError;
+
+    pub struct ShutdownOnSendError {
+        pub in_shutdown: bool,
+        pub inner: hack::send::Send<hack::split::SplitSink<Framed<TcpStream, MessageCodec<OwnedMessage>>>>,
+    }
+
+    impl Future for ShutdownOnSendError {
+        type Item=hack::split::SplitSink<Framed<TcpStream, MessageCodec<OwnedMessage>>>;
+        type Error=WebSocketError;
+
+        fn poll(&mut self) -> Poll<Self::Item, Self::Error> {
+
+           if !self.in_shutdown {
+                match self.inner.poll() {
+                    Ok(Async::Ready(stream)) => return Ok(Async::Ready(stream)),
+                    Ok(Async::NotReady) => return Ok(Async::NotReady),
+                    // WARNING the original error that caused the problem is silently swallowed
+                    Err(e) => {
+                        println!("send error: {:?}", e);
+                        self.in_shutdown = true;
+                    }
+                }
+            }
+
+            if self.in_shutdown {
+                println!("SHUTDOWN");
+                match self.inner.get_mut().inner().poll_lock() {
+                    Async::Ready(mut framed) => {
+                        let tcp_stream: &mut TcpStream = framed.get_mut();
+                        //println!("calling shutdown on socket");
+                        match tcp_stream.shutdown(::std::net::Shutdown::Both) {
+                            Ok(()) => (),
+                            Err(error) => return Err(WebSocketError::IoError(error))
+                        }
+                    },
+                    Async::NotReady => return Ok(Async::NotReady),
+                }
+                println!("Shutdown successfully");
+                return Ok(Async::Ready(self.inner.into_inner()));
+            }
+            Ok(Async::NotReady)
+        }
+    }
+
+}
+
+/// This is all of the futures crate code that needs to be modified to allow the end user to write
+/// shutdown code at all. The patch is only a handful of changed lines, but I reproduced the
+/// affected modules below so that I could have a single file example that worked on the existing
+/// version of websocket. The point I want to illustrate is that the problem is inside the futures
+/// crate, not the websockets crate. They have said that my proposed changes are acceptable, so it
+/// is just a master of upstreaming my patches and then bumping the dependency version of futures
+/// crate in the websockets crate.
+mod hack {
+    pub mod split {
+        use std::any::Any;
+        use std::error::Error;
+        use std::fmt;
+
+        use futures::{Future, StartSend, Sink, Stream, Poll, Async, AsyncSink};
+        use futures::sync::BiLock;
+
+
+
+        /// A `Stream` part of the split pair
+        #[derive(Debug)]
+        pub struct SplitStream<S>(BiLock<S>);
+
+        impl<S> SplitStream<S> {
+            /// Attempts to put the two "halves" of a split `Stream + Sink` back
+            /// together. Succeeds only if the `SplitStream<S>` and `SplitSink<S>` are
+            /// a matching pair originating from the same call to `Stream::split`.
+            pub fn reunite(self, other: SplitSink<S>) -> Result<S, ReuniteError<S>> {
+                other.reunite(self)
+            }
+        }
+
+        impl<S: Stream> Stream for SplitStream<S> {
+            type Item = S::Item;
+            type Error = S::Error;
+
+            fn poll(&mut self) -> Poll<Option<S::Item>, S::Error> {
+                match self.0.poll_lock() {
+                    Async::Ready(mut inner) => inner.poll(),
+                    Async::NotReady => Ok(Async::NotReady),
+                }
+            }
+        }
+
+        /// A `Sink` part of the split pair
+        #[derive(Debug)]
+        pub struct SplitSink<S>(BiLock<S>);
+
+        impl<S> SplitSink<S> {
+            /// Attempts to put the two "halves" of a split `Stream + Sink` back
+            /// together. Succeeds only if the `SplitStream<S>` and `SplitSink<S>` are
+            /// a matching pair originating from the same call to `Stream::split`.
+            pub fn reunite(self, other: SplitStream<S>) -> Result<S, ReuniteError<S>> {
+                self.0.reunite(other.0).map_err(|err| {
+                    ReuniteError(SplitSink(err.0), SplitStream(err.1))
+                })
+            }
+
+            pub fn inner(&mut self) -> &mut BiLock<S> {
+                &mut self.0
+            }
+
+        }
+
+        impl<S: Sink> Sink for SplitSink<S> {
+            type SinkItem = S::SinkItem;
+            type SinkError = S::SinkError;
+
+            fn start_send(&mut self, item: S::SinkItem)
+                -> StartSend<S::SinkItem, S::SinkError>
+            {
+                match self.0.poll_lock() {
+                    Async::Ready(mut inner) => inner.start_send(item),
+                    Async::NotReady => Ok(AsyncSink::NotReady(item)),
+                }
+            }
+
+            fn poll_complete(&mut self) -> Poll<(), S::SinkError> {
+                match self.0.poll_lock() {
+                    Async::Ready(mut inner) => inner.poll_complete(),
+                    Async::NotReady => Ok(Async::NotReady),
+                }
+            }
+
+            fn close(&mut self) -> Poll<(), S::SinkError> {
+                println!("tried to get poll lock");
+                match self.0.poll_lock() {
+                    Async::Ready(mut inner) => inner.close(),
+                    Async::NotReady => {
+                        Ok(Async::NotReady)
+                    },
+                }
+            }
+        }
+
+        pub fn split<S: Stream + Sink>(s: S) -> (SplitSink<S>, SplitStream<S>) {
+            let (a, b) = BiLock::new(s);
+            let read = SplitStream(a);
+            let write = SplitSink(b);
+            (write, read)
+        }
+
+        /// Error indicating a `SplitSink<S>` and `SplitStream<S>` were not two halves
+        /// of a `Stream + Split`, and thus could not be `reunite`d.
+        pub struct ReuniteError<T>(pub SplitSink<T>, pub SplitStream<T>);
+
+        impl<T> fmt::Debug for ReuniteError<T> {
+            fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
+                fmt.debug_tuple("ReuniteError")
+                    .field(&"...")
+                    .finish()
+            }
+        }
+
+        impl<T> fmt::Display for ReuniteError<T> {
+            fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
+                write!(fmt, "tried to reunite a SplitStream and SplitSink that don't form a pair")
+            }
+        }
+
+        impl<T: Any> Error for ReuniteError<T> {
+            fn description(&self) -> &str {
+                "tried to reunite a SplitStream and SplitSink that don't form a pair"
+            }
+        }
+    }
+
+    pub mod send {
+        use futures::{Poll, Async, Future, AsyncSink};
+        use futures::sink::Sink;
+
+        /// Future for the `Sink::send` combinator, which sends a value to a sink and
+        /// then waits until the sink has fully flushed.
+        #[derive(Debug)]
+        #[must_use = "futures do nothing unless polled"]
+        pub struct Send<S: Sink> {
+            sink: Option<S>,
+            item: Option<S::SinkItem>,
+        }
+
+        pub fn new<S: Sink>(sink: S, item: S::SinkItem) -> Send<S> {
+            Send {
+                sink: Some(sink),
+                item: Some(item),
+            }
+        }
+
+        impl<S: Sink> Send<S> {
+            /// Get a shared reference to the inner sink.
+            pub fn get_ref(&self) -> &S {
+                self.sink.as_ref().take().expect("Attempted Send::get_ref after completion")
+            }
+
+            /// Get a mutable reference to the inner sink.
+            pub fn get_mut(&mut self) -> &mut S {
+                self.sink.as_mut().take().expect("Attempted Send::get_mut after completion")
+            }
+
+            fn sink_mut(&mut self) -> &mut S {
+                self.sink.as_mut().take().expect("Attempted to poll Send after completion")
+            }
+
+            pub fn into_inner(&mut self) -> S {
+                self.sink.take().expect("Attempted to poll Send after completion")
+            }
+        }
+
+        impl<S: Sink> Future for Send<S> {
+            type Item = S;
+            type Error = S::SinkError;
+
+            fn poll(&mut self) -> Poll<S, S::SinkError> {
+                if let Some(item) = self.item.take() {
+                    if let AsyncSink::NotReady(item) = try!(self.sink_mut().start_send(item)) {
+                        self.item = Some(item);
+                        return Ok(Async::NotReady)
+                    }
+                }
+
+                // we're done sending the item, but want to block on flushing the
+                // sink
+                try_ready!(self.sink_mut().poll_complete());
+
+                // now everything's emptied, so return the sink for further use
+                return Ok(Async::Ready(self.into_inner()))
+            }
+        }
+    }
+}

YetAnotherMinion commented 6 years ago

Note that my patch is huge because I include all of the futures crate code that needs to be modified to allow the end user to write shutdown code at all. The futures changes are only a handful of changed lines, but I had to reproduce all of the affected modules so that I could have a single file example that worked on the current version of rust-websocket. The point I want to illustrate is that the problem is inside the futures crate, not the websockets crate. They have said that my proposed changes are acceptable, so it is just a master of upstreaming my patches and then bumping the dependency version of futures crate in the websockets crate.

ParadoxSpiral commented 6 years ago

Not sure if this is related, but if the TcpStream gets closed by the other party it never closes on rust-websocket's side.

YetAnotherMinion commented 6 years ago

@ParadoxSpiral that is exactly the situation this issue is describing. I attempted to merge the SplitSink implementation to futures, but they rejected it because it exposed details that they wanted to keep out of public API (see comments on pull request https://github.com/rust-lang-nursery/futures-rs/pull/682) . The end result is that every user of the rust websocket crate will need to carry around their own custom SplitSync impl to handle shutdown of the TcpStream. The patch in my comments shows you how to do this

websockets-rs / rust-websocket

TcpStream sockets leaked by websocket::client::async::Framed::close() #155