xHasKx / luamqtt

luamqtt - Pure-lua MQTT v3.1.1 and v5.0 client
https://xhaskx.github.io/luamqtt/
MIT License
154 stars 41 forks source link

Connection brokes after publishing in batch #41

Closed cat-anna closed 2 years ago

cat-anna commented 2 years ago

Hi again ;) Thanks for quick fix for #40 it makes working with docker much easier.

I bumped into another problem while using copas connector, but I'm not sure if it's stricly related to it. I'm getting closed connection to broken after publishing ~150 messages in batch. Adding "copas.sleep(0)" after each publish solved issue completly. Doing it is good enough for me, but i wanted to let you know about problem.

xHasKx commented 2 years ago

Hi @pgrabas , What is your lua and copas version? Which broker do you use? What QoS for publish? Maybe you can check the mqtt protocol traffic with tcpdump/wireshark?

cat-anna commented 2 years ago

I'm using luajit 5.1, copas 3.0.0-3, luasocket 3.0.0-1. Mqtt protocol 3.3. All messages are qos=0 and retain=false. Broker is mosqutto 2.0.13-1 running on openwrt 21.02.1. Issue does not seem to reproduce using interpretted lua 5.1

pcap: mqtt.zip script: mqtt_batch.zip

Tieske commented 2 years ago

I cannot reproduce this. With or without the sleep, or with a bigger loop (up to 1000).

That said, I have seen this problem, also copas 3, luasocket 3-beta, and my own branch of luamqtt.

xHasKx commented 2 years ago

@pgrabas, I parsed the TCP stream of your pcap file with a Lua script and it's encoded well, so there is no MQTT protocol violation in the TCP traffic. Wireshark shows the same.

But after opening your pcap file in the Wireshark (GUI) it shows the last publish frame no.68 as a grey color because it has the TCP FIN flag set. It looks like a disconnection, which was caused by the client-side, not by the server.

And when luamqtt operates with TCP socket through luasocket module, it's not mixing sending with disconnection (I'm not sure there is a way at all to send data with such FIN flag with luasocket methods). So the next levels above luamqtt should be checked for the issue - copas and luasocket.

Maybe you can try to reproduce such a bug in your environment with luasocket-only sync mode like in the example here - https://github.com/xHasKx/luamqtt/blob/master/examples/sync.lua ?

I also suggest you check the detailed logs of your broker, maybe some limits are hit on its side, just in case...

Tieske commented 2 years ago

I noticed the flag as well, and indeed, LuaSocket doesn't allow to set such flags afaik. Copas 3 is definitely a suspect, since it had a lot of changes.

I retried reproducing it, including with my own code, but again failed. But currently on a VPN to the broker, so maybe it is related to latencies. Since in your case adding a sleep made it go away as well. I'll try and reproduce when on the local network with my broker again.

Tieske commented 2 years ago

@pgrabas with Copas 3.0.0, can you try again, but with this line disabled:

https://github.com/lunarmodules/copas/blob/3.0.0/src/copas.lua#L909

To disable the "autoclose" feature.

Tieske commented 2 years ago

A fix to Copas was merged: https://github.com/lunarmodules/copas/pull/125 so you can try the master branch. This definitely fixed my issue (I checked with wireshark and had the exact same issue in my capture).

Hopefully I'll be releasing a new Copas version later today.

Tieske commented 2 years ago

FYI; Copas 4 has been released.

cat-anna commented 2 years ago

@Tieske I'm amazed that you were able to find&fix it. With copas 4.0.0 problem does not reproduce for me any longer.

xHasKx commented 2 years ago

Thanks, @Tieske , thanks @pgrabas

Tieske commented 2 years ago

Lol, the fix was simple. It’s just that it took 2 days and 400 lines of debug code to find it.