scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.34k stars 1.54k forks source link

Implement batch http routes unsetting #1620

Open xemul opened 1 year ago

xemul commented 1 year ago

Currently there are two methods to register routes for http server -- set() and unset(). When registering several endpoints one should call several set()-s and then several unset()-s. It's good to have a way to unset() all routes in a mux with the single call.

infdahai commented 1 year ago

it use the unset method to remove routes and then remove url in the _map in json_path.

https://github.com/scylladb/seastar/blob/053f94636606b7f0fd4a02f96287840f9d7b25d8/src/http/json_path.cc#L60-L67

so this issue needs a method to remove all the routes from a specified json_path?

tchaikov commented 1 year ago

Currently there are two methods to register routes for http server -- set() and unset(). When registering several endpoints one should call several set()-s and then several unset()-s. It's good to have a way to unset() all routes in a mux with the single call.

are you referring path_description::unset(), if that's the case why would we need to call unset() for registering new endpoints?

xemul commented 1 year ago

I'm talking about e.g. https://github.com/scylladb/scylladb/blob/48b9f31a089c1383ec59f621bc9cd8d3c76b2e8a/api/storage_service.cc#L1531 we need to .unset() every single endpoint, while it would be nice to just wipe them altogether.

tchaikov commented 1 year ago

in this specific case, unset_snapshot() is eventually called when unregistering snapshot_ctl from the http_server, this happens when scylladb is shutting down. we have a handful similar use cases under api/ directory where different subsystems unregister them from the http_server. this pattern repeats itself like:

api::http_context ctx(...);
// ...
foo_controller foo_ctl;
ctx.http_server.set_routes([&ctx, &foo_ctl] (routes& r) {
    seastar::httpd::the_path_desc_for_some_api.set(r, [&foo_ctl](std::unique_ptr<http::request> req) {
        // handle the req which calls "some api"
    });
    seastar::httpd::the_path_desc_for_another_api.set(r, [&foo_ctl](std::unique_ptr<http::request> req) {
        // handle the req which calls "another api"
    });
    seastar::httpd::the_path_desc_for_yet_another_api.set(r, [&foo_ctl](std::unique_ptr<http::request> req) {
        // handle the req which calls "yet another api"
    });
}).get();

auto stop_foo_controller = defer_verbose_shutdown("foo controller API", [&ctx] {
    ctx.http_server.set_routes([&ctx] (routes& r) {
        seastar::httpd::the_path_desc_for_some_api.unset(r);
        seastar::httpd::the_path_desc_for_another_api.unset(r);
        seastar::httpd::the_path_desc_for_yet_another_api.unset(r);
    }).get();
});

the point is to perform its own cleanup by each subsystem. but the problem is, these path description instances are global variables defined by .cc files generated by seastar-json2code.py for each subsystem. they are not "owned" by its subsystem. on the contrary, they are just the vehicle connecting the controller and the http context and http request.

what we need is a convenient way to unset() all the previously set() ed path descriptions in a predefined group. one solution i can think of, is to improve the seastar-json2code.py script to enable it to enumerate all path descriptions immediately under a certain path, for instance

namespace seastar::httpd::storage_service_json {
//...
void for_each_path_under(sstring prefix, std::function<void(const path_description& path)> f);
}

so we can unset all paths under /storage_service/snapshots, like

void unset_snapshot(http_context& ctx, routes& r) {
    httpd::storage_service_json::for_each_path_under("/storage_service/snapshots", [&r] (const auto& path) {
        path.unset(r);
    }
}

@xemul hi Pavel, does this make sense?

xemul commented 1 year ago

@tchaikov , it makes perfect sense, but probably we can avoid prefix argument. Here's why.

In this particular example, we .set() and .unset() routes for different parts of the http::storage_service_json for historical reasons. I think we can assume that we only depupulate (with .unset()-s) the whole namespace in one go and patch the api/ stuff so that e.g. snapshots endpoints live in another namespace. Or sub-namespace inside storage_service_json one. In other words -- groups of endpoints that are .set() and .unset() are not cut in stone wrt namespace they live in (I hope so :crossed_fingers:)

We had very similar challenge for RPC verbs reg/unreg in messaging service. It was solved by grouping sub-services verbs into structs (e.g. -- look at storage_proxy_rpc_verbs), handlers are now registered one-by-one (storage_proxy::remote::init_messaging_service()) and are unregistered in one call (storage_proxy::remote::uninit_messaging_service()) by calling the aforementioned struct's .unregister() call.

This is indeed very similar to what you suggest.

tchaikov commented 1 year ago

@tchaikov , it makes perfect sense, but probably we can avoid prefix argument. Here's why.

In this particular example, we .set() and .unset() routes for different parts of the http::storage_service_json for historical reasons. I think we can assume that we only depupulate (with .unset()-s) the whole namespace in one go and patch the api/ stuff so that e.g. snapshots endpoints live in another namespace. Or sub-namespace inside storage_service_json one. In other words -- groups of endpoints that are .set() and .unset() are not cut in stone wrt namespace they live in (I hope so crossed_fingers)

that's a relief. if we don't have to stick with the existing arrangement of the namespace enclosing the path descriptions for /storage_service and those under /storage_service/snapshots, and can put them into different namespaces, that'd be a lot easier. namely

  1. extract the snapshot API definitions out into another .json file, so that they have their own namespace
  2. update script/seastar-json2code.py so that it offer a free function to enumerate all the path descriptions in the namespace -- a function to enumerate the path descriptions might be more flexible than one which just unset them from given routes. as the former provides a machinery, and the latter accomplishes a certain task.

@infdahai what do you think? and do you still want to revise #1638 to implement this plan instead if it makes sense to you?

We had very similar challenge for RPC verbs reg/unreg in messaging service. It was solved by grouping sub-services verbs into structs (e.g. -- look at storage_proxy_rpc_verbs), handlers are now registered one-by-one (storage_proxy::remote::init_messaging_service()) and are unregistered in one call (storage_proxy::remote::uninit_messaging_service()) by calling the aforementioned struct's .unregister() call.

i see. without the help of some external tooling, we have to group them programmatically.

This is indeed very similar to what you suggest.

infdahai commented 1 year ago

~OK, I'm going to look into it with your suggesstions in this week. If I think I can achieve this, I will alter this pr.~ Otherwise, I will close the pr.

xemul commented 1 year ago

that's a relief. if we don't have to stick with the existing arrangement of the namespace enclosing the path descriptions for /storage_service and those under /storage_service/snapshots, and can put them into different namespaces, that'd be a lot easier.

But that's only about C++ namespace the code lives in :) The URL via which the handler is accessed is cur in stone and musn't change

OK, I'm going to look into it with your suggesstions in this week

Thanks! That'd be really helpful