redboltz / mqtt_cpp

Boost Software License 1.0
438 stars 107 forks source link

Using a finite state machine framework? #464

Closed jonesmz closed 3 years ago

jonesmz commented 5 years ago

I've been researching finite state machine frameworks, and am quite interested in https://boost-experimental.github.io/sml/

@redboltz I've noticed that you've commented on stackoverflow about this framework.

I think it would be good to use in mqtt_cpp for incoming packets (Not for sending, just receiving).

A brief (and very incomplete) example

using namespace sml;
struct 
make_transition_table(
*"start"_s  / &endpoint::async_read_control_packet_type = "wait_for_incoming_packet"_s,
"wait_for_incoming_packet"_s + event<packet_type_read>[check_error_and_transferred_length_guard] / &this_type::handle_control_packet_type = "handle_remaining_length"_s,
"handle_remaining_length"_s + event<packet_length_read_finished>[check_error_and_transferred_length_guard] / &this_type::process_payload = "process_payload"_s,
...
...
"process_connect_header"_s + event<connect_header_processed>[use_version_3.1.1_guard] / &this_type::process_connect_impl<process_connect_phase::client_id> = "process_connect_client_id"_s,
"process_connect_header"_s + event<connect_header_processed>[use_version_5_guard] / &this_type::process_connect_impl<process_connect_phase::properties> = "process_connect_properties"_s,
"process_connect_properties"_s + event<connect_header_processed> / &this_type::process_connect_impl<process_connect_phase::client_id> = "process_connect_client_id"_s,
...
...
*"error_handler"_s + exception<std::exception>[stop_on_error_guard] / &this_type::on_error = X
"error_handler"_s + exception<std::exception>[continue_on_error_guard] / &this_type::on_error = "wait_for_incoming_packet"_s
);

Advantages:

  1. Allows for the flow of packet processing to be done like building blocks all in one place.
  2. Optionally can use sub-states, one for each packet type, to make the state machine easy to understand.
  3. Error conditions (failed async_read, exceptions, invalid packet data) can be handled in the state machine, instead of in-line with the packet processing code.
  4. Exceptions thrown from end-user callbacks can be handled by the state machine -- currently they are caught by boost::asio, which terminates the application
  5. Reduce possibility of mistakes like https://github.com/redboltz/mqtt_cpp/pull/402
  6. No change in end-user API at all. 100% internal detail.
jonesmz commented 5 years ago

Also: See the logging example here: https://boost-experimental.github.io/sml/#real-life-examples

This would be a great way to add logging (https://github.com/redboltz/mqtt_cpp/issues/292) to a large portion of mqtt_cpp without needing to write log statements everywhere.

redboltz commented 5 years ago

I basically agree with you. SML is the next generation state machine library and it pretty nice.

I believe that I'm an expert of state machine. I wrote http://redboltz.wikidot.com/boost-msm-guide

From general state-machine point of view, SML still have unsolved issues I reported:

https://github.com/boost-experimental/sml/issues/88 https://github.com/boost-experimental/sml/issues/171

But the unsolved issues are not used to implement endpoint parsing I guess.

I'm not sure it is needed but we usually use the new event post in other (previous) action handler. And the event should be queued not invoked immediately. It was a serious problem to use SML but now solved.

https://github.com/boost-experimental/sml/issues/144#issuecomment-370596984

            ,"s1"_s + event<e1> / [] (const auto& /*ev*/, sml::back::process<e2> process_event) {
                std::cout << "a1" << std::endl; 
                process_event(e2());
                std::cout << "code after sm.process_event(e1());" << std::endl;
            }

Extracting process_event function object is pretty difficult to understand. Perhaps this feature(API) might be changed in the future.

BTW, wandbox supports SML https://wandbox.org/permlink/VaZ9WydBmZCYVOkJ

SML is not a part of Boost, so far. We need to add SML as git sub-module.

Could you check and update #463 (see #460) before replacing switch-case with SML? I'd like to release v7.0.0 first. And then could you implement the SML approach?

jonesmz commented 5 years ago

I actually tried to use the Boost.Experimental.SML library with some of my code, and had an unpleasant experience with it.

Perhaps you can offer some advice on these? https://github.com/boost-experimental/sml/issues/298

Currently I do not have time to change mqtt_cpp to state machine. My work has some upcoming projects that will keep me busy for a few months. Maybe after that.

redboltz commented 5 years ago

I wrote a PoC code that uses Boost.Log. The logger object and transition table (ep_table) is a member of endpoint. endpoint's member functions can be called from ep_table's guard and action.

See https://wandbox.org/permlink/9zNCAmrvrCg4EHGC

I used function object generation macro. It not so elegant.

I noticed that the following wrapped lambda expression hack only works on g++, not clang++

        return make_transition_table(
            *"s1"_s  + sml::on_entry<_> /
            [](with_prop& p) {
                // -> void is mandatory to avoid function body instantiation
                // to deduce return value type.
                [](auto& p) -> void {
                    std::cout << "on_entry" << std::endl;
                    p.method1();
                }(p);
            }
        );

See https://wandbox.org/permlink/3weaMiWznUc7nIKH (the current setting is g++)

redboltz commented 5 years ago

My approach above might depend on undefined behavior. See https://stackoverflow.com/a/58247080/1922763 I'm still not 100% sure it is really ODR violation.

redboltz commented 5 years ago

I'm still not 100% sure it is really ODR violation.

I am finally convinced.

Here is a safe approach:

https://wandbox.org/permlink/4hPv8FjU6SI0MiGS

We need to keep the following code sequence:

  1. Function object class definition and a member operator() declaration.
    • The operator() declaration only uses Endpoint &.
  2. ep_table that invokes ep_funcname from its guard and/or action.
    • endpoint & is used as the parameter type of guard/action lambda expression and it is passed to the function object operator() that is declared at the step1.
  3. endpoint definition.
  4. operator() definition that is declared at the step 1.
    • endpoint member functions are called from the operator() definition. At this point, endpoint is defined.
jonesmz commented 5 years ago

A related thought, but I was recently investigating Boost.Coroutines, specifically with this library: https://github.com/yandex/ozo/blob/master/examples/request.cpp

They do some extreme "black magic", from what I can tell. But it looks extremely powerful.

I think a finite state machine framework (e.g. Boost.Experimental.SML) would be an improvement over the current design, but I am curious what you think of Boost.Coroutines. Both 1) Use inside mqtt_cpp. 2) Use by end-user code when calling mqtt_cpp.

redboltz commented 5 years ago

I've used Boost.Coroutine2.

Here is the code: https://github.com/msgpack/msgpack-c/blob/master/example/x3/stream_unpack.cpp

This code unpacks MessagePack formatted date using Boost.Spirit.X3. If the message is incomplete (stream from the socket), then asio read is invoked using coroutine. It continues parse and asio read back and forth until all message is parsed.

My impression at that time, it's extremely hard to use. If I forgot to call some function, segfault happens easily. And doesn't work some environment. It works only on Boost.Context supported platform.

I don't have a plan to use Boost.Coroutine2 in mqtt_cpp. If C++ standard officially support coroutine, then I start to think supporting coroutine.

jonesmz commented 5 years ago

Great information!

Coroutine support is officially included in c++20, by the way.

redboltz commented 5 years ago

I don't check coroutine TS continuously. I expect that it is easy to use :)

I think that Standard version is the best choice. I'm not sure the difference between standard version and coroutine TS version. If they are the same, how to provide both coroutine version and non-coroutine (current) version could be the issue. My knowledge is based on https://www.boost.org/doc/libs/1_71_0/doc/html/boost_asio/example/cpp17/coroutines_ts/echo_server.cpp. The standard coroutine could be slightly different.

jonesmz commented 5 years ago

I think we should ignore coroutines (in all the various forms) for now.

Better to focus on state-machine, and whether that is a good idea.

My concern with the SML library is that it has a very ridged way that it must be used. I'm very unhappy with the need for using lambdas. I would much rather see SML be a member variable that takes this as constructor parameter, and then the actions are pointer-to-member-functions. E.g. &mqtt_cpp::endpoint::handle_properties

redboltz commented 5 years ago

I think we should ignore coroutines (in all the various forms) for now.

Ok, I agree with you. I personally think that using coroutine is too early.

Better to focus on state-machine, and whether that is a good idea.

I used to use Boost.MSM but it require long compile times. So Boost.SML is better.

My concern with the SML library is that it has a very ridged way that it must be used. I'm very unhappy with the need for using lambdas. I would much rather see SML be a member variable that takes this as constructor parameter, and then the actions are pointer-to-member-functions. E.g. &mqtt_cpp::endpoint::handle_properties

Unfortunately you don't use the pointer to function as an action directly. Using lambda is SML's design policy I guess. If you want to use funtion pointer, you need to wrap calling the function pointer with the lambda expression. Or you could write a pull request to support function pointer to SML.

redboltz commented 5 years ago

It seems that I found the solution. See https://github.com/boost-experimental/sml/issues/93. It still require the lambda expression (it is SML design policy) but I can eliminate ugly glue codes.

redboltz commented 5 years ago

I found an issue and the solution at https://github.com/boost-experimental/sml/issues/311#issuecomment-548883956

We can apply the same approach as follows: https://wandbox.org/permlink/iSo7FzMhOqosCzMW

The key point is how to make a sandwich structure.

I mean

  1. Minimal structure for the transition table.
  2. Transition table.
  3. Structure that contain the state machine and application data.

This approach uses a virtual function base approach. So the structure is

  1. The base class that has pure virtual functions that are called from the transition table.
  2. Transition table.
  3. The sub class that inherits 1. The class contain the state machine and application data.

I'm not 100% sure but maybe de-virtualization can work (need checking godbolt).

Now, ugly macros are disappeared.

NOTE: The transition table still requires lambda expressions but it is minimal. We can replace them std::bind but it's not essential, I don't want to use std::bind.

redboltz commented 3 years ago

Finally, I decided to use Boost.Asio stackless coroutine instead of state machines. And it has already been merged.