rabbitmq / rabbitmq-mqtt

RabbitMQ MQTT plugin
https://www.rabbitmq.com/mqtt.html
Other
174 stars 67 forks source link

Last Will with QoS 2 fails with an exception instead of being downgraded to QoS 1 #214

Closed pbilstein closed 4 years ago

pbilstein commented 4 years ago

Hi,

whenever we are setting a MQTT Last Will with QoS 2, the plugin crashes when it tries to fulfill it:

2019-11-04 23:09:43 =CRASH REPORT====
  crasher:
    initial call: rabbit_channel:init/1
    pid: <0.11035.1>
    registered_name: []
    exception exit: {{function_clause,[{rabbit_mqtt_processor,delivery_mode,[2],[{file,"src/rabbit_mqtt_processor.erl"},{line,718}]},{rabbit_mqtt_processor,amqp_pub,2,[{file,"src/rabbit_mqtt_processor.erl"},{line,836}]},{rabbit_mqtt_processor,send_will,1,[{file,"src/rabbit_mqtt_processor.erl"},{line,784}]},{rabbit_web_mqtt_handler,terminate,3,[{file,"src/rabbit_web_mqtt_handler.erl"},{line,163}]},{cowboy_websocket,terminate,3,[{file,"src/cowboy_websocket.erl"},{line,617}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]},[{gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1183}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}
    ancestors: [<0.11033.1>,rabbit_direct_client_sup,rabbit_sup,<0.265.0>]
    message_queue_len: 0
    messages: []
    links: [<0.11033.1>]
    dictionary: [{process_name,{rabbit_channel,{<<"172.24.0.15:37672 -> 172.24.0.3:9001">>,1}}},{guid_secure,{{31,'rabbit@rabbitmq-1',#Ref<0.1828259457.2187591682.167217>},0}},{channel_operation_timeout,15000},{rand_seed,{#{jump => #Fun<rand.3.8986388>,max => 288230376151711743,next => #Fun<rand.2.8986388>,type => exsplus},[80411076728963622|136058777223923014]}}]
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 10759
  neighbours:
2019-11-04 23:09:43 =SUPERVISOR REPORT====
     Supervisor: {<0.11033.1>,rabbit_channel_sup}
     Context:    child_terminated
     Reason:     {function_clause,[{rabbit_mqtt_processor,delivery_mode,[2],[{file,"src/rabbit_mqtt_processor.erl"},{line,718}]},{rabbit_mqtt_processor,amqp_pub,2,[{file,"src/rabbit_mqtt_processor.erl"},{line,836}]},{rabbit_mqtt_processor,send_will,1,[{file,"src/rabbit_mqtt_processor.erl"},{line,784}]},{rabbit_web_mqtt_handler,terminate,3,[{file,"src/rabbit_web_mqtt_handler.erl"},{line,163}]},{cowboy_websocket,terminate,3,[{file,"src/cowboy_websocket.erl"},{line,617}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}
     Offender:   [{pid,<0.11035.1>},{name,channel},{mfargs,{rabbit_channel,start_link,[1,<0.11032.1>,<0.11032.1>,<0.11026.1>,<<"172.24.0.15:37672 -> 172.24.0.3:9001">>,rabbit_framing_amqp_0_9_1,{user,<<"mqtt_default">>,[mqtt_default],[{rabbit_auth_backend_internal,none}]},<<"mqtt">>,[{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,bool,true},{<<"consumer_cancel_notify">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true}],<0.11029.1>,<0.11034.1>]}},{restart_type,intrinsic},{shutdown,70000},{child_type,worker}]

RabbitMq version: Docker rabbitmq:3.7.20-management-alpine Client Lib: mqttjs 3.0.0

When the Last Will has set qos 0 or 1, it works.

michaelklishin commented 4 years ago

QoS 2 is not supported (it was explained many times why but long story short: it promises something it cannot deliver without cluster-wide coordination of clients and nodes, which is not practical and no MQTT implementation does) and is downgraded to QoS 1 in RabbitMQ. In most places but not for Last Will messages, apparently.

pbilstein commented 4 years ago

Ok, yes I understand the fact that qos 2 is not fully supported and that it should only be downgraded to 1. Thanks for clarifying the issue.

In our case we have a couple of clients out there setting last will with qos 2, which we do not have under control at the moment. Therefore, it would be very nice to have a fix here.

michaelklishin commented 4 years ago

We understand that clients will [try to] use QoS 2. It should be downgraded everywhere gracefully.

pbilstein commented 4 years ago

Ok great. I had a short look at the code and changed the following part (in branch 3.7.20), which seems to fix the issue for us and our tests are still all green:

# rabbit_mqtt_processor.erl
718     delivery_mode(?QOS_0) -> 1;
719     delivery_mode(?QOS_1) -> 2.

->

# rabbit_mqtt_processor.erl
718     delivery_mode(?QOS_0) -> 1;
719     delivery_mode(?QOS_1) -> 2;
720     delivery_mode(?QOS_2) -> 2.

Maybe it helps. I guess a PR is too heavy for it, right? Or would it be appreciated?

lukebakken commented 4 years ago

I guess a PR is too heavy for it, right? Or would it be appreciated?

Yes, it would be appreciated, thank you

michaelklishin commented 4 years ago

I've started looking at an integration test. We'd consider a PR (thank you!) but would only consider this issue resolve after we add a test.

pbilstein commented 4 years ago

Wow, that was fast. I just wanted to prepare a PR with a test and saw you already did it. Next time, I will try to be first ;-)

michaelklishin commented 4 years ago

No worries :) You have contributed by identifying the problem and even narrowing it down to one function. That's very valuable 👍

michaelklishin commented 4 years ago

We'll post an update to this issue when Concourse produces 3.8 and 3.7 alphas that you can use to verify.