lunatic-solutions / submillisecond

A lunatic web framework
MIT License
911 stars 18 forks source link

Where to put Guards? #45

Closed bkolobara closed 2 years ago

bkolobara commented 2 years ago

This is a summary of today's discussion on discord. Feel free to add stuff if I left something out.

As @SquattingSocrates discovered, the main issue is that you usually want to add a custom response to a Guard (e.g. Permission denied) and this is currently not possible. This also sparked a bigger discussion, what is actually a Guard and do we need an explicit if syntax for it?

There are 3 places a guard can live:

1. Extractors as Guards

We are using Rocket and Axum inspired extractors in handlers, this means that we can turn each extractor into a Guard by returning a "rejection" value that implements IntoResponse. So we have already a way of expressing a "Guard" on the handler level.

In my opinion, this is not always that great, especially if you want to guard a whole sub-route:

router! {
    "/admin" => {
         // I want to guard all handlers here
        "a" => a,
        "b" => b,
        // ... rest
    }
})

If someone else on your team adds a handler "c", they need to remember to add an explicit guard to it or this might become a security issue.

2. Middleware as Guards

Another place where we can add guards is middleware. Middleware wrap handlers and they could be used to short-circuit a response:

router! {
    "/admin" => {
         use isAdmin;

        "a" => a,
        "b" => b,
        // ... rest
    }
})

that way we give middleware more power, but simplify the mental model around it. You don't have anymore two different concepts Guards and Middleware. I think this simplifies the internals, because we don't need a dedicated if syntax. And developers are already used to using middleware that way in other web frameworks.

3. Explicit Guard syntax

The last option is an explicit guard, but with the option to return custom responses:

router! {
    "/admin" if isAdmin => {
        "a" => a,
        "b" => b,
        // ... rest
    }
})

The biggest benefit here is that the guarding part is explicit. Opposite of middleware, where some middleware could act as a guard but other could just be doing some logging.

Early returns

The question mostly boils down to, do we want only one way to end a request early (guard the handler)? Or do we want to be able to do it from multiple places. One extreme would be to allow each part to return an early response (middleware, extractor and guard). On the other side we could for example only allow middleware to early return, remove if guards and remove custom responses from extractors. Of course, any in-between combination of these 3 would be an option too.

bkolobara commented 2 years ago

After our short discussion in this issue and seeing another related issue in the Axum repository, I'm more and more becoming a fan of something I would call exact match based routing. I'm not sure if this is the best way to do it or if even it is feasible to implement this for us, but just wanted to share this here with you so we have it somewhere written down.

The main idea is to have a routing model that can be easily reasoned about, without much ambiguity. At the same time provide composability and powerful middleware. This would be the rules:

1. Static before dynamic on the same nesting level

The routing macro on the same level should follow the common pattern (in other frameworks) of sorting routes, so that first we try the "more static" ones. E.g.

router! {
     "/:name/:age" => h1,
     "/test/34" => h2,
     "/:name/34" => h3,
})

should be attempted to match in the order of:

  1. "/test/34"
  2. "/:name/34"
  3. "/:name/:age"

2. No fallback for sub-routes / handlers

Once a sub-route or handler is taken it can't fall through anymore. The implications of this are:

  1. If Path extracting in handlers fail, the request fails with an error and no other route is tried. This is how Axum does it. Currently we do it more like Rocket (Path extractor failing means route not matched), but this approach is a better fit for Rocket than us because they define the route above the function in a macro and do some name matching in the arguments. So that the connection between routing and function arguments is more clear.
  2. Middleware can be eagerly executed and re-write paths. This means that we can give more power to middleware:
    router! {
         use NormalizePath;
         // rest
    })

    In this case we can push the removing of trailing / into the NormalizePath middleware and because it's executed eagerly, it can re-write the path before the next match is found. We also wouldn't need to collect middleware until a match is found.

In this case entering a sub-route would be irreversible (like a match statement) and it would be safe to call the middleware as we can't end up in another. The middleware would still be called, even we end up not matching any sub-routes and return 404.

3. Middleware, one method with a next call

Middleware becomes a guard (can stop propagating requests) and can return a custom error.

4. Remove extractors

This might be a controversial one :). Rocket pioneered extractors, even they call them Request Guards. They are somewhat complicated in Rocket, can return 3 values: success, failure and forward. Where failure is caught by a Catcher route depending on the status code and can be turned into a response. Forward means try next matching route. I feel like this is a bit complicated and hard to understand. The extractor is at the same time a guard, a sub-router and a way of returning errors.

Axum simplifies this, as they only have 2 values: success & failure. They basically remove the sub-router part from the extractor. Failure also implements IntoResponse, meaning that if an extractor fails, the specified error will be returned. This leads to extractors that can't really be customised or easily reused.

It's a neat trick to have extractors (declarative execution), but it removes some of the power from developers and hides control flow. It's a mild inconvenience to do this:

fn handler(req: Request) -> Result<String> {
    let json_body = Json::parse(request)?;
}

instead of

fn handler(req: Request, json_body: Json) -> String {}

But the first example gives us a lot of control on how we want to handle failure of parsing, and being explicit is more in the spirit of Rust. Extractors also don't offer us much on the type safety side.

For us this would mean that extractors can't be guards anymore, because they don't exist.

5. Remove the if Guard syntax

If middleware is the guard, I don't see a need for a custom syntax.

Conclusion

This approach reduces our coolness factor a bit, but increases comprehension?

I feel ok about rules 1, 2 & 3, but am more divided about the rest. Really like the custom if syntax, makes guards explicit and it's a bit different from other approaches to guards. Extractors had a bit of a "wow" effect when I first saw them and could help us attract more users. And almost all popular Rust web frameworks use them (Actix, Rocket, Axum, ...), so developers are familiar with the concept.

P.S. I was also trying to come up with some novel concepts while thinking about this. Like:

router! {
     "/:user" if Admin::from(:user) || 403 => handler
     "/not_user"  => handler
})

fn handler(admin: Admin) {}

then use some Rust type magic to make this not compile, because the route /not_user doesn't define the Admin and the handler is using it. This would make the "extractor as guard" syntax explicit in the router and you wouldn't be able to use extractors that you didn't guard (compile-time error). This would also work for other stuff, not only paths:

router! {
      // Only continue if this is a valid
     "/api" if Json || 400 => {
         /hello => handler
     }
})

fn handler(json: Json) {}

In the end I didn't end up with anything useful though.

tqwewe commented 2 years ago
  1. Static before dynamic on the same nesting level

I agree with this, though it might be a little more complicated than initially expected since subroutes can be kind of mixed up in the trie if I'm not mistaken. The current solution I did with push and pop like approach with the cursor works well, but could probably be improved with sorting the routes as well/instead.

  1. No fallback for sub-routes / handlers

    If Path extracting in handlers fail, the request fails with an error and no other route is tried.

I think it would simplify things if we didnt allow fallback/follow through routing when extractors fail (including Path). If the user wants to have different behaviour for different url param types, they can just take them as string and do the parsing manually I think, since it's not a very common thing to do anyway.

Middleware can be eagerly executed and re-write paths.

This could be useful, though I don't know if rewriting paths could lead to confusing behaviour.

  1. Middleware, one method with a next call

I agree with this one! :100:

  1. Remove extractors

I'm not actually against this idea, but I do think extractors can be very appealing. I read through the issue axum has about https://github.com/tokio-rs/axum/issues/1116, and I think that's indeed a problem we'd likely have to solve too if we stick with extractors. I wonder if we could let middleware handle the errors differently perhaps.

  1. Remove the if Guard syntax

The biggest drawback I see with guards, is that you cannot use any of the data they might compute. For example parsing a json body or session object to check for a role. Whereas with middleware, we could do this and insert it in the request as an extension. This would let us do:

router! {
    "/admin" use IsRoleMiddleware("admin") => {
        // ...
    }
}

With our current approach with extractors, middleware and guards, it seems like they overlap too much and guards arent very useful siunce both extractors and middleware can be used as guards.

SquattingSocrates commented 2 years ago
  1. Static before dynamic on the same nesting level

My position on this hasn't really changed because I think that just as you would add explicit matches in a match expression you should do the same in the router. Sure we can try and sort it but doesn't that make it less of an explicit match and more of "internal magic" match?

So if you have a match expression this works:

match name {
    "bob" => handle_bob
    user => handler_user
}

But this doesn't:

match name {
    user => handler_user
    "bob" => handle_bob
}

and so I think since we're mimicking the match expression syntactically it makes sense to stay consistent with it's matching logic. So a case like this would already work now and stay consistent with the logic of match:

router! {
     "/test/34" => h2,
     "/:name/34" => h3,
     "/:name/:age" => h1,
}
  1. No fallback for sub-routes / handlers

What is the implication of this though? Because List Routers require fall-through, does this mean we get rid of list routers? I've been a fan of strict matching since the beginning but it limits our feature set while making the routing more obvious to the user.

2.1. If Path extracting in handlers fail, the request fails with an error and no other route is tried.

I like this because it again mimics a match more closely and there you can't jump back once you went down one of the arms.

2.2 Middleware can be eagerly executed and re-write paths.

This won't be so simple I'm afraid because if we keep the two-function middleware trait we need to have an explicit call to after which will never happen if a handler returns. If however we decide to go with a single-function middleware with next how do we keep state of the request/response? do we just pass a mut &Request to every middleware and if so how does the signature of middleware look like?

fn mid1(&self, req: Request, next: NextFn) -> Response {
    // ... do stuff before handler
   let res = next(req);
   // ... do stuff after handler
   res
}

let ROUTER = router! {
    use mid1;

   GET "/hello" => hello
}

// what would this expand to?
let ROUTER = |req, reader, params| {
    // execute middleware here?
    mid1(req); // -> returns a response?
    if reader.peek(6) == "/hello" {
        reader.read(6);
        return HandlerFn::handle(hello, ...);
    }
}

I hope this example explains my concerns and I think that we still need to collect middleware one way or another.

I also don't know if the whole manipulating the path is a good idea. Handling dangling slashes isn't that hard, we already do it and we can add a config param to configure this behaviour globally later if users want this. I don't really see why you would want to manipulate the path. The handler should do any "mapping" of the request to the apps logic and the framework should not give mutable access to the path and params at all because that's exactly the "bad" part of middleware imo :)

  1. Middleware, one method with a next call

I like it, but it has to fit into the big picture since there are concerns I mentioned above.

  1. Remove extractors

I don't mind this but I think there's some value in having a "cool" handler definition 🙂 I would suggest that since parsing is sometimes really repetitive and most of the time we just want to parse and fail with a 400 (or other) code if it failed for most of the app we could either keep extractors or add decorators that just the function with a parsing statement. Something I also noticed is that it would be nice to have a per-handler way of defining error codes and also if we want to generate swagger documentation or something like that having access to the expected headers, request body and response body would be nice.

It could look like this:

#[swagger]
#[json]
fn hello_handler(Json(hello): Json<String>) -> Json<HelloYourself> {
     // .. do only handler related logic
}

// =======================
// this expands into

fn hello_handler(req: Request) -> Response {
    match Json::parse(request) {
        Err(e) => {} // handle error in a default way
        Ok(json) => {
              let res_json = |Json(hello): Json<String>) -> Json<HelloYourSelf> {}(json);
              Response::from(res_json)
    }
}

And the swagger would try and add some metadata so that in the router we can fit it all together. Maybe we could also achieve the same by adding some syntax to the router, not sure that the best way to do this is. So a decorator here would just be an explicit way to wrap a handler function so that we don't have to repeat ourselves in every single handler. An app is usually 90%+ of the same data type and sometimes handles others, but I wouldn't want to write out parsing of JSON and error handling of it 200 times in a large app with many handlers.

  1. Remove the if Guard syntax

Again I'm fine with this although it really gives it an even more "match" like feeling. It would have to fit it with all the other parts.

bkolobara commented 2 years ago

But this doesn't:

match name {
    user => handler_user
    "bob" => handle_bob
}

We just need to take into account that Rust does a lot of hand holding here. For example this example will result in a warning of the compiler:

warning: unreachable pattern
  --> src/lib.rs:13:9
   |
12 |         user => handler_user,
   |         ---- matches any value
13 |         "bob" => handle_bob,
   |         ^^^^^ unreachable pattern
   |
   = note: `#[warn(unreachable_patterns)]` on by default

It would be hard for us to do this kind of analysis? Rust is also super strict. If you don't list all possible values, the code will not compile. This is necessary because people keep extending structures and adding new matching arms, so it's easy to accidentally miss something if you have 100+ routes and sub-routes. You add a new route, and now you can't reach another one a bit down, but when you test the new route everything works fine until you deploy to production.

My reasoning here was, minimise the amount of accidental errors. The only reason, why someone would put a more specific route after a general one would be by accident. If we can't warn them during compile time ("look, this is never going to match"), the best next thing would be to reorder automatically the routes so that it matches a more specific one.

However, if we can offer the same DX as Rust here, compile time warnings on impossible routes, this would be great and I don't mind keeping the current behaviour.

I also don't know if the whole manipulating the path is a good idea. Handling dangling slashes isn't that hard, we already do it and we can add a config param to configure this behaviour globally later if users want this.

I think dangling slashes was a bad example. The general idea behind this was to trigger middleware without a match (e.g. requests to http should be redirected to https). We can always work around this by introducing a _ match at the end (that is affected by middleware?), but it becomes a bit inconsistent:

// Calls middleware Redirect
router! {
     use Redirect;
     "/test" => test,
     _ => handler_404
}
// Doesn't call middleware Redirect?
router! {
     use Redirect;
     "/test" => test,
}

Or we could always trigger top level middleware, even on 404? Also, the "first collect all middleware" approach moves us away a bit from the a match arm is irreversibly taken.


Currently, my main goal is just to wrap my head around our rules. I like the current behaviour, but some parts of it are accidental and the result of some implementation details. So, I'm trying to "formalise" some of the rules and am thinking about edge cases and how we could reduce them. For example, it's hard for me to answer the question "When is the if part of the route evaluated?". Is it when a route completely matches, or is a partial match enough? Is it after the middleware runs or before? Did we make this decision because we are limited by the implementation or did we concisely choose to do it?

All this questions have different implications. Just to give an example:

router! {
     "/admin" if Admin => {
         "/super_secret_url"
     }
}

If someone accesses /admin/test and /admin/super_secret_url and is not an admin, should they get a permission denied response? Or should they get two different responses 404 and permission denied, acknowledging that there is a super_secret_url? This question remains even if we drop the if syntax and use a middleware as guard instead.

Talking about it and hearing your preferred behaviour was already super helpful <3.


I think the general consensus is, both the if syntax and extractors stay and we need to figure out how to make them "overlap less".

Would having two kinds of middleware be useful?

  1. if syntax means it can return early (FailableMiddleware)
  2. use syntax means it can't (InfallibleMiddleware)

This would also communicate clearly to users what middleware can abort the request.

tqwewe commented 2 years ago

The only reason, why someone would put a more specific route after a general one would be by accident. If we can't warn them during compile time ("look, this is never going to match"), the best next thing would be to reorder automatically the routes so that it matches a more specific one.

After working with the library a few more days, I think we honestly shouldn't reorder anything, nor should we error if two routes match the same path. The reason for this, is the simple fact that routes can return Err(RouteError::RouteNotMatch) which means it would conintue to try to other handler.

That's why I think the checkpointing implementation I added makes the most sense, since if a route return RouteNotMatch, then it'll undo the cursor position and continue trying next routes.

For example, it's hard for me to answer the question "When is the if part of the route evaluated?". Is it when a route completely matches, or is a partial match enough? Is it after the middleware runs or before? Did we make this decision because we are limited by the implementation or did we concisely choose to do it?

The if part of a route is checked at the same time as checking the path. Eg if reader.read(4) == "/foo" && .. guards here .. {.

Middleware runs when a route is fully matched and just about to be executed. It all gets aggregated to the very last moment. Though, they currently run before extractors, which makes sense I think. Middlewares & handlers have the ability to return RouteNotFound to try the next route. If a middleware has side effects, I think it should be responsible for undoing these side effects if child handlers/middleware return RouteNotFound. For example:

fn update_process_local_state(req: Request, next: impl Next) -> Result<Response, RouteError> {
    let value = PLS.get();
    PLS.set(10);
    let res = next(req);
    if let Err(RouteError::RouteNotMatch(_)) = res {
        // child handler wants to continue to next route,
        // so we undo side effects in this middleware
        PLS.set(value);
    }
    res
}

I think the general consensus is, both the if syntax and extractors stay and we need to figure out how to make them "overlap less".

My opinion at this point would be that we remove if guards completely, and let middleware do that job since it can simply return RouteError::RouteNotMatch.

SquattingSocrates commented 2 years ago

Since I'm fresh back from my vacation I'm going to try and take a step back here :) When I first thought about routing in Rust I thought it would be dreamy to have something like a match expression that can match multiple different template strings and then run certain logic on it. An if on the match arm would be a great addition because now you can write even more modular handlers to handle different types of requests on the same template string. It seemed to me like a kinda ideal fit for a rust/lunatic way of routing.

So that would look something like this:

{
    "/users/:user_id/settings" if UserIsAdmin => admin_settings_handler
    "/users/:user_id/settings" => user_settings_handler
}

which is pretty much what we have already. And to be honest, even without if guards this is awesome and enough to start building cool projects, especially if you have the safety of each request to run in its own memory space and do async I/O out of the box without any async rust.

First, we need to start with what kind of framework we want to build: something larger/bloated like Django, spring boot, .NET, Nest.JS, Laravel, Phoenix or a micro-framework like flask, express.js or sugar(elixir). If we're not looking into creating something too large, we shouldn't really overcomplicate it because there's actually just a few things that we need to do:

We talk a lot about extractors and middleware etc but we're really just trying to give the user a way to

  1. re-use the same logic for multiple requests
  2. hook into request/response lifecycle
  3. use request/response data with user-defined datatypes (e.g. parse body from json to struct and vice versa)
  4. Enable some sort of short-circuiting a request

And since we want routing to a be central feature to our "micro-framework" we have the liberty to ask ourselves some interesting questions like:

  1. What is an idiomatic way to chain/compose/reuse logic in rust/lunatic and how could we apply them for submillisecond? (macros, traits, generics?)
  2. do we even need middleware and if yes, why?
  3. What is the responsibility of our provided router!? Should we just match a path with a handler and not care whether handler is actually a composition of multiple middleware and handlers or just a stand-alone handler? (e.g. what data should belong to a router and which to handlers?)
  4. Who's responsibility is it to check whether some request data matches some form (users or frameworks) and where should the "transformation/validation" happen? (be called explicitly by user via function call or decorator OR be smartly injected into the function parameters)
  5. For what reasons do we want users to short-circuit and can/should the framework do short-circuiting? (e.g. fall-through matching going up and down matching levels OR an explicit match just like rust does it) How do they both fit together?

If, of course, we want to create a larger framework like Phoenix or Spring Boot we will need to treat routing as just one of the features and ask ourselves a different set of questions first. Because these frameworks are usually implemented with a certain way of developing in mind and mostly prioritise ease of development in a multi-developer setup where the business logic is growing so rapidly that devs require multiple "crutches" from the framework that help avoid or delay the implementation and invokation of a footgun 🔫

But they also help larger projects use less boilerplate code and provide good ways to do integration tests and handle multiple communication protocols and ways of interacting with a service, like AMQP, gRPC etc. This also means that there are different types of "actors", some of which are "request handlers" in request-response modes, "subscribers" in an event-based environment (e.g. with queues) and also "tasks", which run like cron-jobs. They also usually provide a fairly opinionated way of binding databases and cache and also end up requiring a ton of configuration. Ways to render some content like html views etc are not uncommon. Neither is built-in means to add authentication. Development and deployment tools are also part of that.

So, in total there's quite a lot of things that a larger framework contains and I think that we should first settle for a kind of framework and move forward from there, because design decisions should depend on what we want to do eventually I believe.