Open YetAnotherMinion opened 6 years ago
Good catch, sounds like a big issue. Where in the example do you imagine the cleanup happening? (I haven't written this example so I'm not that familiar)
Here is a patch for the examples/parallel-server.rs
that notices when the socket has been closed by the other side, and calls shutdown on its TcpStream instance to release the underlying file descriptor.
There are other problems with the examples/parallel-server.rs
that prevents us from running the thor benchmark I described because the example server is really slow at noticing connections are dead.
./bin/thor --amount 2000 --concurrent 100 --masked --messages 1 ws://127.0.0.1:8081
However to verify that we clean up resources correctly, what you can now do is run the benchmark with a number of connections just below your file descriptor limit.
./bin/thor --amount 1000 --concurrent 100 --masked --messages 1 ws://127.0.0.1:8081
The thor process will exit, then after a while the example server will get around to sending to the dead sockets (example server is really slow) and notice they are dead. Then you will see a bunch of output like
SHUTDOWN
send error: IoError(Error { repr: Os { code: 32, message: "Broken pipe" } })
SHUTDOWN
send error: IoError(Error { repr: Os { code: 32, message: "Broken pipe" } })
SHUTDOWN
send error: IoError(Error { repr: Os { code: 32, message: "Broken pipe" } })
SHUTDOWN
send error: IoError(Error { repr: Os { code: 32, message: "Broken pipe" } })
SHUTDOWN
After that output stops you can rerun the thor benchmark, so that the total number of connections you have made to the running example server is greater than the per process ulimit -n
.
./bin/thor --amount 1000 --concurrent 100 --masked --messages 1 ws://127.0.0.1:8081
1000 + 1000 > 1024 for example
The patched example server will not crash. You can rinse and repeat as many times as you like, all the old connections will cleaned up after a few seconds as the server notices they are dead. Fixing the example to notice connections are dead more quickly is another issue I can address in a separate PR.
diff --git a/examples/parallel-server.rs b/../src/bin/scratch.rs
index 43e1e80..1a49749 100644
--- a/examples/parallel-server.rs
+++ b/../src/bin/scratch.rs
@@ -1,4 +1,5 @@
extern crate websocket;
+#[macro_use]
extern crate futures;
extern crate futures_cpupool;
extern crate tokio_core;
@@ -28,7 +29,7 @@ fn main() {
let mut core = Core::new().expect("Failed to create Tokio event loop");
let handle = core.handle();
let remote = core.remote();
- let server = Server::bind("localhost:8081", &handle).expect("Failed to create server");
+ let server = Server::bind("127.0.0.1:8081", &handle).expect("Failed to create server");
let pool = Rc::new(CpuPool::new_num_cpus());
let connections = Arc::new(RwLock::new(HashMap::new()));
let (receive_channel_out, receive_channel_in) = mpsc::unbounded();
@@ -53,7 +54,7 @@ fn main() {
.borrow_mut()
.next()
.expect("maximum amount of ids reached");
- let (sink, stream) = framed.split();
+ let (sink, stream) = hack::split::split(framed);
let f = channel.send((id, stream));
spawn_future(f, "Senk stream to connection pool", &handle_inner);
connections_inner.write().unwrap().insert(id, sink);
@@ -92,12 +93,23 @@ fn main() {
let connections = connections.clone();
let sink = connections.write()
.unwrap()
- .remove(&id)
- .expect("Tried to send to invalid client id",);
+ .remove(&id);
+ //.expect("Tried to send to invalid client id",);
+ let sink = match sink {
+ Some(sink) => sink,
+ None => {
+ println!("{} has gone away, dropping msg", id);
+ return Ok(());
+ }
+ };
- println!("Sending message '{}' to id {}", msg, id);
- let f = sink.send(OwnedMessage::Text(msg))
- .and_then(move |sink| {
+ //println!("Sending message '{}' to id {}", msg, id);
+ let f = experiment::ShutdownOnSendError {
+ in_shutdown: false,
+ inner: hack::send::new(sink, OwnedMessage::Text(msg)),
+ };
+
+ let f = f.and_then(move |sink| {
connections.write().unwrap().insert(id, sink);
Ok(())
});
@@ -131,19 +143,19 @@ fn spawn_future<F, I, E>(f: F, desc: &'static str, handle: &Handle)
E: Debug
{
handle.spawn(f.map_err(move |e| println!("Error in {}: '{:?}'", desc, e))
- .map(move |_| println!("{}: Finished.", desc)));
+ .map(move |_| /*println!("{}: Finished.", desc)*/ () ));
}
fn process_message(id: u32, msg: &OwnedMessage) {
if let OwnedMessage::Text(ref txt) = *msg {
- println!("Received message '{}' from id {}", txt, id);
+ //println!("Received message '{}' from id {}", txt, id);
}
}
type SinkContent = websocket::client::async::Framed<tokio_core::net::TcpStream,
websocket::async::MessageCodec<OwnedMessage>>;
-type SplitSink = futures::stream::SplitSink<SinkContent>;
+type SplitSink = hack::split::SplitSink<SinkContent>;
// Represents one tick in the main loop
fn update(
connections: Arc<RwLock<HashMap<Id, SplitSink>>>,
@@ -182,3 +194,253 @@ impl Iterator for Counter {
}
}
}
+
+/// Expermiment module is an example of code the end user would need to write to respond to send
+/// errors on the TcpStream. In this example, all send errors are dropped and the socket
+/// immediately closed.
+pub mod experiment {
+ // This custom sink is only needed until futures crate merges PR
+ use super::hack;
+
+ use futures::{Future, Async};
+ use futures::Poll;
+
+ use websocket::client::async::Framed;
+ use websocket::async::TcpStream;
+ use websocket::async::MessageCodec;
+ use websocket::OwnedMessage;
+ use websocket::result::WebSocketError;
+
+ pub struct ShutdownOnSendError {
+ pub in_shutdown: bool,
+ pub inner: hack::send::Send<hack::split::SplitSink<Framed<TcpStream, MessageCodec<OwnedMessage>>>>,
+ }
+
+ impl Future for ShutdownOnSendError {
+ type Item=hack::split::SplitSink<Framed<TcpStream, MessageCodec<OwnedMessage>>>;
+ type Error=WebSocketError;
+
+ fn poll(&mut self) -> Poll<Self::Item, Self::Error> {
+
+ if !self.in_shutdown {
+ match self.inner.poll() {
+ Ok(Async::Ready(stream)) => return Ok(Async::Ready(stream)),
+ Ok(Async::NotReady) => return Ok(Async::NotReady),
+ // WARNING the original error that caused the problem is silently swallowed
+ Err(e) => {
+ println!("send error: {:?}", e);
+ self.in_shutdown = true;
+ }
+ }
+ }
+
+ if self.in_shutdown {
+ println!("SHUTDOWN");
+ match self.inner.get_mut().inner().poll_lock() {
+ Async::Ready(mut framed) => {
+ let tcp_stream: &mut TcpStream = framed.get_mut();
+ //println!("calling shutdown on socket");
+ match tcp_stream.shutdown(::std::net::Shutdown::Both) {
+ Ok(()) => (),
+ Err(error) => return Err(WebSocketError::IoError(error))
+ }
+ },
+ Async::NotReady => return Ok(Async::NotReady),
+ }
+ println!("Shutdown successfully");
+ return Ok(Async::Ready(self.inner.into_inner()));
+ }
+ Ok(Async::NotReady)
+ }
+ }
+
+}
+
+/// This is all of the futures crate code that needs to be modified to allow the end user to write
+/// shutdown code at all. The patch is only a handful of changed lines, but I reproduced the
+/// affected modules below so that I could have a single file example that worked on the existing
+/// version of websocket. The point I want to illustrate is that the problem is inside the futures
+/// crate, not the websockets crate. They have said that my proposed changes are acceptable, so it
+/// is just a master of upstreaming my patches and then bumping the dependency version of futures
+/// crate in the websockets crate.
+mod hack {
+ pub mod split {
+ use std::any::Any;
+ use std::error::Error;
+ use std::fmt;
+
+ use futures::{Future, StartSend, Sink, Stream, Poll, Async, AsyncSink};
+ use futures::sync::BiLock;
+
+
+
+ /// A `Stream` part of the split pair
+ #[derive(Debug)]
+ pub struct SplitStream<S>(BiLock<S>);
+
+ impl<S> SplitStream<S> {
+ /// Attempts to put the two "halves" of a split `Stream + Sink` back
+ /// together. Succeeds only if the `SplitStream<S>` and `SplitSink<S>` are
+ /// a matching pair originating from the same call to `Stream::split`.
+ pub fn reunite(self, other: SplitSink<S>) -> Result<S, ReuniteError<S>> {
+ other.reunite(self)
+ }
+ }
+
+ impl<S: Stream> Stream for SplitStream<S> {
+ type Item = S::Item;
+ type Error = S::Error;
+
+ fn poll(&mut self) -> Poll<Option<S::Item>, S::Error> {
+ match self.0.poll_lock() {
+ Async::Ready(mut inner) => inner.poll(),
+ Async::NotReady => Ok(Async::NotReady),
+ }
+ }
+ }
+
+ /// A `Sink` part of the split pair
+ #[derive(Debug)]
+ pub struct SplitSink<S>(BiLock<S>);
+
+ impl<S> SplitSink<S> {
+ /// Attempts to put the two "halves" of a split `Stream + Sink` back
+ /// together. Succeeds only if the `SplitStream<S>` and `SplitSink<S>` are
+ /// a matching pair originating from the same call to `Stream::split`.
+ pub fn reunite(self, other: SplitStream<S>) -> Result<S, ReuniteError<S>> {
+ self.0.reunite(other.0).map_err(|err| {
+ ReuniteError(SplitSink(err.0), SplitStream(err.1))
+ })
+ }
+
+ pub fn inner(&mut self) -> &mut BiLock<S> {
+ &mut self.0
+ }
+
+ }
+
+ impl<S: Sink> Sink for SplitSink<S> {
+ type SinkItem = S::SinkItem;
+ type SinkError = S::SinkError;
+
+ fn start_send(&mut self, item: S::SinkItem)
+ -> StartSend<S::SinkItem, S::SinkError>
+ {
+ match self.0.poll_lock() {
+ Async::Ready(mut inner) => inner.start_send(item),
+ Async::NotReady => Ok(AsyncSink::NotReady(item)),
+ }
+ }
+
+ fn poll_complete(&mut self) -> Poll<(), S::SinkError> {
+ match self.0.poll_lock() {
+ Async::Ready(mut inner) => inner.poll_complete(),
+ Async::NotReady => Ok(Async::NotReady),
+ }
+ }
+
+ fn close(&mut self) -> Poll<(), S::SinkError> {
+ println!("tried to get poll lock");
+ match self.0.poll_lock() {
+ Async::Ready(mut inner) => inner.close(),
+ Async::NotReady => {
+ Ok(Async::NotReady)
+ },
+ }
+ }
+ }
+
+ pub fn split<S: Stream + Sink>(s: S) -> (SplitSink<S>, SplitStream<S>) {
+ let (a, b) = BiLock::new(s);
+ let read = SplitStream(a);
+ let write = SplitSink(b);
+ (write, read)
+ }
+
+ /// Error indicating a `SplitSink<S>` and `SplitStream<S>` were not two halves
+ /// of a `Stream + Split`, and thus could not be `reunite`d.
+ pub struct ReuniteError<T>(pub SplitSink<T>, pub SplitStream<T>);
+
+ impl<T> fmt::Debug for ReuniteError<T> {
+ fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
+ fmt.debug_tuple("ReuniteError")
+ .field(&"...")
+ .finish()
+ }
+ }
+
+ impl<T> fmt::Display for ReuniteError<T> {
+ fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
+ write!(fmt, "tried to reunite a SplitStream and SplitSink that don't form a pair")
+ }
+ }
+
+ impl<T: Any> Error for ReuniteError<T> {
+ fn description(&self) -> &str {
+ "tried to reunite a SplitStream and SplitSink that don't form a pair"
+ }
+ }
+ }
+
+ pub mod send {
+ use futures::{Poll, Async, Future, AsyncSink};
+ use futures::sink::Sink;
+
+ /// Future for the `Sink::send` combinator, which sends a value to a sink and
+ /// then waits until the sink has fully flushed.
+ #[derive(Debug)]
+ #[must_use = "futures do nothing unless polled"]
+ pub struct Send<S: Sink> {
+ sink: Option<S>,
+ item: Option<S::SinkItem>,
+ }
+
+ pub fn new<S: Sink>(sink: S, item: S::SinkItem) -> Send<S> {
+ Send {
+ sink: Some(sink),
+ item: Some(item),
+ }
+ }
+
+ impl<S: Sink> Send<S> {
+ /// Get a shared reference to the inner sink.
+ pub fn get_ref(&self) -> &S {
+ self.sink.as_ref().take().expect("Attempted Send::get_ref after completion")
+ }
+
+ /// Get a mutable reference to the inner sink.
+ pub fn get_mut(&mut self) -> &mut S {
+ self.sink.as_mut().take().expect("Attempted Send::get_mut after completion")
+ }
+
+ fn sink_mut(&mut self) -> &mut S {
+ self.sink.as_mut().take().expect("Attempted to poll Send after completion")
+ }
+
+ pub fn into_inner(&mut self) -> S {
+ self.sink.take().expect("Attempted to poll Send after completion")
+ }
+ }
+
+ impl<S: Sink> Future for Send<S> {
+ type Item = S;
+ type Error = S::SinkError;
+
+ fn poll(&mut self) -> Poll<S, S::SinkError> {
+ if let Some(item) = self.item.take() {
+ if let AsyncSink::NotReady(item) = try!(self.sink_mut().start_send(item)) {
+ self.item = Some(item);
+ return Ok(Async::NotReady)
+ }
+ }
+
+ // we're done sending the item, but want to block on flushing the
+ // sink
+ try_ready!(self.sink_mut().poll_complete());
+
+ // now everything's emptied, so return the sink for further use
+ return Ok(Async::Ready(self.into_inner()))
+ }
+ }
+ }
+}
Note that my patch is huge because I include all of the futures crate code that needs to be modified to allow the end user to write shutdown code at all. The futures changes are only a handful of changed lines, but I had to reproduce all of the affected modules so that I could have a single file example that worked on the current version of rust-websocket. The point I want to illustrate is that the problem is inside the futures crate, not the websockets crate. They have said that my proposed changes are acceptable, so it is just a master of upstreaming my patches and then bumping the dependency version of futures crate in the websockets crate.
Not sure if this is related, but if the TcpStream gets closed by the other party it never closes on rust-websocket's side.
@ParadoxSpiral that is exactly the situation this issue is describing. I attempted to merge the SplitSink implementation to futures, but they rejected it because it exposed details that they wanted to keep out of public API (see comments on pull request https://github.com/rust-lang-nursery/futures-rs/pull/682) . The end result is that every user of the rust websocket crate will need to carry around their own custom SplitSync impl to handle shutdown of the TcpStream. The patch in my comments shows you how to do this
When you call close on a Framed stream, it does not actually close or shutdown the underlying socket fd. See (https://github.com/tokio-rs/tokio-io/issues/80). I tested the parallel-server from the examples/ folder by setting the file descriptor limit per process to 1024 and used https://github.com/observing/thor to connect the the server 2000 times and send 1 message with 100 concurrently active connections.
The example server will crash in the accept call because it hits the file descriptor limit. View the open file descriptors using
ls /proc/$PID/fd
of the example parallel-server, and you will see that none of the socket connections were closed (there will be 1024 sockets open even though the client program only has 100 open)This is the shutdown that gets called by calling close on the Framed. Note that it does not call shutdown on the socket, unlike
TcpStream::shutdown
Which is called by
Which comes from tokio-io
Framed
impl using the AsyncWrite version of shutdown because of its generic implementation.See https://github.com/tokio-rs/tokio-io/issues/80#issuecomment-352207760
I solved the TcpStream not being closed by creating a custom split function that returned a custom SplitSink struct that returned mutable access to the contained stream.
Which can then be used with a custom future that calls the shutdown method on the TcpStream instance directly, not through the AsyncWrite trait.
I propose that the websocket library include this custom version of
SplitSink
,split
, and theShutdown
future because the generic futures implementation for a stream does not handle cleaning up any resources the generic stream may have. Dropping both the sink and stream of a split stream does not close the socket either.Errata: I also copied the implementation of SplitStream into MySplitStream so I could make reunite work with the new MySplitSink.