Closed svanharmelen closed 11 months ago
do you use 0.12.14? could you create reproducible example, I cannot reproduce with 0.12.14
Yes, I'm using 0.12.14
. If I switch back to 0.12.13
it works as expected so I'm currently looking at the changes between those versions. Being able to share a reproducible example will take some effort as I would need to extract a lot of bits and pieces from our code base to compile a standalone example.
So I'm trying to see if I can understand/follow the flag logic you tweaked and maybe spot something in there. Which I think I did... Let me open a PR do I can show you my suggested changes and if needed discuss them.
@fafhrd91 please see #162 for a possible fix (or at least a starting point towards a fix 😉)
do you use client or client and server?
I only use the server and am testing with the Paho MQTT client (in this case the one written in Go: github.com/eclipse/paho.golang
)
could you post configuration for your mqttserver
This is the main setup:
let server = Server::build()
.bind("mqtt", "127.0.0.1:8883, move |_| {
chain_factory(ntex_tls::rustls::Acceptor::new(tls_acceptor.clone()))
.map_err(|_| MqttError::Service(ServerError {}))
.and_then(
MqttServer::new()
.v3(v3::MqttServer::new({
let session = session.clone();
move |handshake: v3::Handshake| {
handshake_v3(handshake, session.clone())
}
})
.control(control_service_factory_v3())
.inflight(MQTT_MAX_IN_FLIGHT)
.inflight_size(MQTT_MAX_IN_FLIGHT_SIZE)
.publish(fn_factory_with_config(
|session: v3::Session<ClientSession>| {
Ready::Ok::<_, ServerError>(fn_service(move |req| {
publish_v3(session.clone(), req)
}))
},
)))
.v5(v5::MqttServer::new({
let session = session.clone();
move |handshake: v5::Handshake| {
handshake_v5(handshake, session.clone())
}
})
.control(control_service_factory_v5())
.max_inflight_size(MQTT_MAX_IN_FLIGHT_SIZE)
.receive_max(MQTT_MAX_IN_FLIGHT)
.publish(fn_factory_with_config(
|session: v5::Session<ClientSession>| {
Ready::Ok::<_, ServerError>(fn_service(move |req| {
publish_v5(session.clone(), req)
}))
},
))),
)
})?
And these are the control factories:
pub fn control_service_factory_v3() -> impl ServiceFactory<
v3::ControlMessage<ServerError>,
v3::Session<ClientSession>,
Response = v3::ControlResult,
Error = ServerError,
InitError = ServerError,
> {
fn_factory_with_config(|_: v3::Session<ClientSession>| {
Ready::Ok(fn_service(move |control| match control {
v3::ControlMessage::Error(e) => Ready::Ok(e.ack()),
v3::ControlMessage::ProtocolError(e) => Ready::Ok(e.ack()),
v3::ControlMessage::Ping(p) => Ready::Ok(p.ack()),
v3::ControlMessage::Disconnect(d) => Ready::Ok(d.ack()),
v3::ControlMessage::Subscribe(mut s) => {
s.iter_mut().for_each(|mut s| s.confirm(s.qos()));
Ready::Ok(s.ack())
}
v3::ControlMessage::Unsubscribe(s) => Ready::Ok(s.ack()),
v3::ControlMessage::Closed(c) => Ready::Ok(c.ack()),
v3::ControlMessage::PeerGone(c) => Ready::Ok(c.ack()),
}))
})
}
pub fn control_service_factory_v5() -> impl ServiceFactory<
v5::ControlMessage<ServerError>,
v5::Session<ClientSession>,
Response = v5::ControlResult,
Error = ServerError,
InitError = ServerError,
> {
fn_factory_with_config(|_: v5::Session<ClientSession>| {
Ready::Ok(fn_service(move |control| match control {
v5::ControlMessage::Auth(a) => Ready::Ok(a.ack(v5::codec::Auth::default())),
v5::ControlMessage::Error(e) => {
Ready::Ok(e.ack(v5::codec::DisconnectReasonCode::UnspecifiedError))
}
v5::ControlMessage::ProtocolError(e) => Ready::Ok(e.ack()),
v5::ControlMessage::Ping(p) => Ready::Ok(p.ack()),
v5::ControlMessage::Disconnect(d) => Ready::Ok(d.ack()),
v5::ControlMessage::Subscribe(mut s) => {
s.iter_mut().for_each(|mut s| s.confirm(s.options().qos));
Ready::Ok(s.ack())
}
v5::ControlMessage::Unsubscribe(s) => Ready::Ok(s.ack()),
v5::ControlMessage::Closed(c) => Ready::Ok(c.ack()),
v5::ControlMessage::PeerGone(c) => Ready::Ok(c.ack()),
}))
})
}
I cannot share the handles and publish functions without first having to refactor them... Does this help already?
I will need some time for investigation. do you set keep-alive in handshake handler?
do you set keep-alive in handshake handler?
not sure what that means, can you show me what that would look like?
Our handler hasn't changed between versions, so did some options change that I have to set?
I am about this field ConnectAck::server_keepalive_sec, but I see you don't set it. this could the problem, I will try to reproduce the problem
I just tested with this in our handler, but it gives the same results:
let keep_alive = handshake.packet().keep_alive;
Ok(handshake.ack(session).keep_alive(keep_alive))
could you also post initial Connect
packet
Yes:
[2023-12-08T16:49:05Z INFO ] handshake packet: Connect {
clean_start: true,
keep_alive: 10,
session_expiry_interval_secs: 0,
auth_method: None,
auth_data: None,
request_problem_info: true,
request_response_info: false,
receive_max: None,
topic_alias_max: 0,
user_properties: [],
max_packet_size: None,
last_will: None,
client_id: "be6b355e-2723-4dc5-814a-ab76e8bbd503",
username: Some(
"username",
),
password: Some(
b"password",
),
}
i found bug, will prepare fix by tomorrow
Ah, cool! Curious to see (and understand) the root cause and the fix 🙂
Thanks for your help so far!
0.12.15 is released, should fix this issue
Thanks @fafhrd91! Just tested v0.12.15
and things work as expected again 🎉
Hello 👋🏻
We've been using this crate for a little while now and thought it was time to update from
0.9
to0.12
. But ever since we did, we noticed our clients are reconnecting every 15 seconds. To make sure the "problem" still exists in the latest version we pulled this repository and used the crate from a local path, but this still gives the same disconnects.We did some tests and enabled
TRACE
logging which shows us the following interesting bits:Now after this snippet you see a lot of publish packages coming in (we are testing this with a test client which continuously sends data) and then after 15 seconds we see this part:
And in between the publish messages we saw 2 of these:
The client we are using to test this with, didn't change after we updated this crate from
0.9
to0.12
so it's either caused by some (default) config that is changed, updated or introduced in0.12
, or its some kind of bug.I will continue to debug this one myself as well, but I'm hoping this might sound familiar or sound like something you may have seen before? Any suggestions that might push us in the right direction in order to resolve the disconnects are very much appreciated.
Thanks! Sander