withastro / roadmap

Ideas, suggestions, and formal RFC proposals for the Astro project.
292 stars 29 forks source link

Astro middleware #555

Closed ematipico closed 1 year ago

ematipico commented 1 year ago

This RFC introduce a new middleware API in Astro

Links

matthewp commented 1 year ago

One other comment, if you call next() but don't care about the result (because you don't need to do anything to the response), are you expected to return the return value? Ie do you have to do return next(); or is just next(); ok? I don't have an opinion on this and am kind of assuming that the first way is how it's currently implemented. If so I could see this as being a good thing, that you must return some response.

ematipico commented 1 year ago

One other comment, if you call next() but don't care about the result (because you don't need to do anything to the response), are you expected to return the return value? Ie do you have to do return next(); or is just next(); ok? I don't have an opinion on this and am kind of assuming that the first way is how it's currently implemented. If so I could see this as being a good thing, that you must return some response.

I suppose it really depends on how the user will it. next(); and return next() will yield two different results. Without the return, the next middleware won't access to the Response object. If the user knows that that middleware won't need it, I suppose next(); is a correct pattern.

lilnasy commented 1 year ago

Could a middleware tell if it's running in dev mode?

The use case I'm thinking of is where a middleare caches responses in memory, similar how to how Next's ISR works. The middleware should not act during dev.

cc: #228

lilnasy commented 1 year ago

Could a middleware communicate with an integration?

The use case I'm thinking of where an integration hashes inline resources, and a middleare uses the hashes to create a strict Content-Security-Policy header.

I understand letting middleware be added by integrations was made out of scope for now, but receiving data from an integration alone would be valuable.

cc: #377

ematipico commented 1 year ago

Could a middleware tell if it's running in dev mode?

The use case I'm thinking of is where a middleare caches responses in memory, similar how to how Next's ISR works. The middleware should not act during dev.

cc: #228

Not at this stage. It wasn't part of the original RFC. Currently, the middleware is code that goes in production, and the mode is a piece of information that is lost after the build.

ematipico commented 1 year ago

Could a middleware communicate with an integration?

The use case I'm thinking of where an integration hashes inline resources, and a middleare uses the hashes to create a strict Content-Security-Policy header.

I understand letting middleware be added by integrations was made out of scope for now, but receiving data from an integration alone would be valuable.

cc: #377

We can explore this valid use case after the middleware is released under an experimental flag. During the exploration phase, I evaluated creating a middleware as an integration, but that was too far from the initial RFC.

lilnasy commented 1 year ago

Could a middleware tell if it's running in dev mode? The use case I'm thinking of is where a middleare caches responses in memory, similar how to how Next's ISR works. The middleware should not act during dev. cc: #228

Not at this stage. It wasn't part of the original RFC. Currently, the middleware is code that goes in production, and the mode is a piece of information that is lost after the build.

@ematipico I looked around a little more, and matthewp gave an example for a dev-only route in the issue. If that example works, prod-only ISR could too.

matthewp commented 1 year ago

Yeah, I would expect import.meta.env.MODE to work just like it does in any other file in the project.

ematipico commented 1 year ago

I wasn't aware of that metadata, and I personally haven't tried. But if that metadata is available in endpoints, it will definitely be available in middleware too!

snimavat commented 1 year ago

I come from Java background, and middleware word has always been confusing to me. If it does the job of intercepting requests/response - "Interceptors" seems more meaningful

ematipico commented 1 year ago

I come from Java background, and middleware word has always been confusing to me. If it does the job of intercepting requests/response - "Interceptors" seems more meaningful

Yeah, there are different names to address the same pattern. SvelteKit calls it "hooks", like fastify.

raulfdm commented 1 year ago

I come from Java background, and middleware word has always been confusing to me. If it does the job of intercepting requests/response - "Interceptors" seems more meaningful

Yeah, there are different names to address the same pattern. SvelteKit calls it "hooks", like fastify.

In Next.js world they also call middleware, but their middleware is less capable than the pattern we're seeing here.

https://nextjs.org/docs/pages/building-your-application/routing/middleware

gkatsanos commented 1 year ago

how safe are we to use a middleware in production that intercepts a request and sets a global variable or a value in a nano store based on the request domain name?

ematipico commented 1 year ago

how safe are we to use a middleware in production that intercepts a request and sets a global variable or a value in a nano store based on the request domain name?

The feature is under an experimental flag for different reasons:

Although, the code written inside the middleware is stored in the backend (like the rest of the Astro code). It's advised to ensure you don't write any sensible data inside Astro.locals

pilcrowOnPaper commented 1 year ago

Quick question, why should locals be serializable?

The information must be serializable because storing information that evaluates at runtime is unsafe. If, for example, we were able to store a JavaScript function, an attacker could exploit the victim’s website and execute some unsafe code

How would an attacker inject or execute unsafe code?

pilcrowOnPaper commented 1 year ago

It also looks like API routes run before the middleware?

ematipico commented 1 year ago

Quick question, why should locals be serializable?

The information must be serializable because storing information that evaluates at runtime is unsafe. If, for example, we were able to store a JavaScript function, an attacker could exploit the victim’s website and execute some unsafe code

How would an attacker inject or execute unsafe code?

It's mostly about third-party libraries. Let's suppose that there's an astro-magic-auth library that does something with locals; then, this library gets hacked and decides to push some sketchy code in thelocals:

locals.email = "<script>eval(alert('hacked'))</script>"

Nobody prevents a third-party library from doing so.

ematipico commented 1 year ago

It also looks like API routes run before the middleware?

That was a bug :) I believe it's been fixed in https://github.com/withastro/astro/pull/7106

pilcrowOnPaper commented 1 year ago

@ematipico

I'm not sure serializing data prevents this?

locals.email = returnsUnsafeScript(); // <script>eval(alert('hacked'))</script>

if we were able to store a JavaScript function, an attacker could exploit the victim’s website and execute some unsafe code

But if you're using third party dependencies, they can already run unsafe code:

// node_modules/some-library/index.js

export someFunction = () => {};

console.log("unsafe code");

Not having the ability to store any data type seems to be a massive limitation IMO.

ematipico commented 1 year ago

I'm not sure serializing data prevents this?

It indeed doesn't; probably, we should also escape the strings, just to be sure.

But if you're using third-party dependencies, they can already run unsafe code:

It's more about data that gets rendered on the page, not about the code per se.

Not having the ability to store any data type seems to be a massive limitation IMO.

It is, I know! What would you suggest instead? How can we support more data types without overlooking the security?

tropperstyle commented 1 year ago

I have a request to expose the exports of the matched route module to the middleware.

Since auth guards are a primary example for this proposal, this could help avoid path logic growing in the middleware and allow explicitly flagging pages as needed:

---
export const requireAdmin = true;
---
// Make all non-Astro defined exports available to context
export const onRequest = ({ exports, request, redirect, locals }, next) => {
  if (exports.requireAdmin) {
    locals.admin = await fetchAdmin(request);
    if (!locals.admin) return redirect('/login');
  }
  return next();
}
pilcrowOnPaper commented 1 year ago

@ematipico

Doesn't Astro escape strings inside html templates anyway? If you’re using set:html, it should be your responsibility to make sure your dependencies are safe. I really think this is a non issue that doesn’t need to be addressed. SvelteKit has locals for like 2 years and I’m not aware of any html injection, and Express has had it for even longer.

matthewp commented 1 year ago

I think the primary driver for the serializable restriction is that we plan to make it possible for adapters (like Vercel) to split the middleware into a separate bundle to deploy to an Edge. This way you can have things like auth-checks happen at edge locations and then your app code in serverless.

For this to work we need to be able to serialize locals to send from the edge worker to the origin. We weren't sure that people would even notice this!

But it sounds like a lot of people are wanting to put complex objects onto locals. This would just mean you couldn't use the edge feature. But for a lot of people that's a fine tradeoff. So I think this user feedback is a good reason to lift the restriction.

matthewp commented 1 year ago

cc @ematipico Curious what you think about the above 👆

gkatsanos commented 1 year ago

Sorry to jump into the RFC with a slightly different topic. We have a use-case that combines nanostores with the middleware and we wanted to share it with you to get some feedback ("will it work"). :

import { setBrand } from "@store/brand";

export function onRequest(context, next) {
  if (context.request.includes === "europages") {
    setBrand("ep");
  } else {
    setBrand("wlw");
  }

  return next();
}

store:

// store/brand.js
import { atom, action } from "nanostores";

export const brand = atom("wlw");

export const setBrand = action(brand, "setBrand", (store, brand) => {
  store.set(brand);
});

then use the store value in a root Layout to provide a 'theme' based on the route base url:

<html lang="en" data-brand={brand.value}>

That's in a node/standalone SSR mode. I was told it's not a reliable solution over Discord.

ematipico commented 1 year ago

cc @ematipico Curious what you think about the above 👆

We could lift off the check, I don't mind at all, as long as it's always secure to store "complex" types.

Regarding the serialisation and the edge worker, maybe we should move the check inside the integration itself.

matthewp commented 1 year ago

@ematipico Ah yeah, if that's a restriction of that use-case it does make senes to move the check there. I'm not sure exactly how that will be implemented, but I imagine that the adapter will add it's own middleware at the end of the chain. It can do the check there.

matthewp commented 1 year ago

@gkatsanos I'm not very familiar with nanostores, but looking at the code it appears that brand is a global which doesn't work well in a server environment, you have multiple requests all being processed at the same time.

I would recommend putting branch onto the locals object.

pilcrowOnPaper commented 1 year ago

For the Vercel adapter, if we're going to add it, please make deploying to Vercel's Edge middleware optional. The biggest limitation with Next.js' Middleware is that you can't deploy it to just a regular serverless server(?).

wassfila commented 1 year ago

@gkatsanos I'm not very familiar with nanostores, but looking at the code it appears that brand is a global which doesn't work well in a server environment, you have multiple requests all being processed at the same time.

I would recommend putting branch onto the locals object.

right, that's where the discord support post is going to, thanks for confirmation https://discord.com/channels/830184174198718474/1109101214802653224

matthewp commented 1 year ago

@pilcrowOnPaper Yeah, that will definitely be an opt-in feature.

matthewp commented 1 year ago

The serialization restriction has been lifted from the RFC.

With that being the case, I'd like to do a call for consensus on this RFC. This will be the final comment period, 3 days, if there are still issues that need to be addressed please bring those forward now.