lukeed / sirv

An optimized middleware & CLI application for serving static files~!
MIT License
1.07k stars 58 forks source link

URLs not correctly parsed for ^1.0.0 with express #82

Closed konnextv closed 4 years ago

konnextv commented 4 years ago

Just a short heads-up: Using the current sapper setup URLs containing spaces (%20) stopped working in conjunction with express when I updated to 1.0.0. Sirv gives me a 404 if I try to access any such URL (I don't know if this extends to other special characters, too).

For repro, I am using express@4.17.1.

I am sorry if this is a problem with express but I did some testing and I think chore(sirv): bump “@polka/url” for better decoding introduced the problem. (Might as well be an issue with @polka/url 🤔 )

lukeed commented 4 years ago

Hey, this is intentional – see #11, #20, #21

Your /foo%20bar.txt is looking for a /foo bar.txt file, which of course looks strange with spaces, but applies with any special characters too. For example, #11 presents the /fünke.txt scenario. When that request gets sent over the wire, it's received as /f%C3%BCnke.txt and so has to be decoded.

It actually gets trickier than that too, because the same character can be encoded in different ways depending on the HTTP client (see https://github.com/lukeed/sirv/pull/11#issuecomment-418188876), which means that decoding has to happen at the server in order to normalize the request path.

Hopefully that helps~! Closing, but please feel free to reply if I misread or got something wrong. Thanks!

konnextv commented 4 years ago

Just to clarify: You would expect Express to handle the URL parsing and sirv expects req.path to be /foo bar.txt instead of /foo%20bar.txt, correct?

lukeed commented 4 years ago

If coming from Express, yes, since req.path would be defined. Otherwise sirv takes care of itself via parser(req, true) – that true is a decode flag.

konnextv commented 4 years ago

Understood, thank you. I will open an issue over at sapper though since it took me a long time to figure this out because I didn't expect it to be an error with the current template used with express.

Cheers!

konnextv commented 4 years ago

Or would it be possible to introduce a sirv flag that forces decoding even if req.path is defined already?

Because I just noticed that I cannot manipulate req.path myself (readonly) to circumvent this...

lukeed commented 4 years ago

There won't be a config, sorry. Customization is through the req.path value – you've given it what it's supposed to use, and decoding the same value twice can be problematic. (Decoding is also expensive perf-wise.)

This isn't a Sapper issue either. Express purposefully doesn't decode incoming paths because it's expensive and can lead to mismatch. They'd rather say that '/fu%CC%88nke.txt' should 404 and '/fu%u0308nke.txt' should 200 (or vice versa) even though they both would match /fünke.txt file if decoded. They do, however, decode req.params values (only).

It's only because you've named your file(s) in a non-standard way that you've run into this problem. Definitely recommend replacing spaces with hyphens or underscores in your file names. But if you don't/can't, here's how you can do it:

express()
  .use((req, res, next) => {
    req.path = decodeURIComponent(req.path);
    next();
  })
  // ...
  .use(
    compression({ threshold: 0 }),
    sirv('static', { dev }),
    sapper.middleware()
   )
konnextv commented 4 years ago

Thanks for your advice, I understand.

But what you suggested does not work since req.path is readonly.

I got it to work by wrapping the sirv call in another middleware though:

express()
  .use(
    compression({ threshold: 0 }),
    (req, res, next) => {
      sirv("static", { dev }).call(
        null,
        { ...req, path: decodeURIComponent(req.path) },
        res,
        next
      );
    },
    sapper.middleware()
  )

So just one last question? Do you see any problem with this? To my knowledge the req object is preserved while still being able to change path only for further processing with sirv.

If not, I will suggest this approach for sapper users who want to use express.

lukeed commented 4 years ago

Yes, that's fine. It's not pretty, but IMO it's what you should be forced to do to accommodate your non-standard file names.

My above example can be changed to this (I forgot req.path is a getter)

express()
  .use((req, res, next) => {
    req.url = decodeURIComponent(req.url);
    next();
  })
  // ...
konnextv commented 4 years ago

Info for others dealing with this: The above code does not work either, only my snippet helped in my case so far.

lukeed commented 4 years ago

Ran locally, this works with the static/foo bar.txt example presented in linked issue:

const express = require('express');
const sirv = require('sirv');

express()
    .use((req, res, next) => {
        req.url = decodeURIComponent(req.url);

        if (req._parsedUrl) {
            var nxt = decodeURIComponent(req._parsedUrl.pathname);
            req._parsedUrl.path = req._parsedUrl.pathname = nxt;
            req._parsedUrl._raw = req.url;
        }

        next();
    })
    .use(
        sirv('static', { dev: true })
    )
    .listen(3000)

Have to trick parseurl to not only work with the new req.url (which is fine, acceptable) but you have to modify its internal state/cache to send back the value you want.

FWIW, Express calls parseurl 6 times before the first bit of user code runs.

benmccann commented 3 years ago

I do think this is a bug in sirv. The req.path should never be decoded. The fact that Polka is decoding it is a bug in Polka (https://github.com/lukeed/polka/issues/142). sirv is matching Polka's incorrect behavior rather than Express's correct behavior. Soon sirv still stop working with Polka as well once https://github.com/lukeed/polka/pull/172 is merged

lukeed commented 3 years ago

Solved by the Polka PR and a7bd6722321ba55377f785efdf9b72083cec7968

The req._decoded check I just added should have always been in there, since it was sirv's way of preventing duplicate decodeURIComponent calls... but that was only true when it receives a request from a polka@next app, since Polka was previously writing the decoded value to req.path

Now that the latest polka@next (and Express) doesn't decode automatically anymore, req.path isn't trustworthy on its own. It needs req._decoded to be there too in order to trust it.

This combo-check is backwards compatible for polka@next users who don't upgrade and will unblock Express users for the first time, who have always had a "raw" req.path value set.