Closed konnextv closed 4 years ago
Hey, this is intentional – see #11, #20, #21
Your /foo%20bar.txt
is looking for a /foo bar.txt
file, which of course looks strange with spaces, but applies with any special characters too. For example, #11 presents the /fünke.txt
scenario. When that request gets sent over the wire, it's received as /f%C3%BCnke.txt
and so has to be decoded.
It actually gets trickier than that too, because the same character can be encoded in different ways depending on the HTTP client (see https://github.com/lukeed/sirv/pull/11#issuecomment-418188876), which means that decoding has to happen at the server in order to normalize the request path.
Hopefully that helps~! Closing, but please feel free to reply if I misread or got something wrong. Thanks!
Just to clarify:
You would expect Express to handle the URL parsing and sirv expects req.path to be /foo bar.txt
instead of /foo%20bar.txt
, correct?
If coming from Express, yes, since req.path
would be defined. Otherwise sirv
takes care of itself via parser(req, true)
– that true
is a decode flag.
Understood, thank you. I will open an issue over at sapper though since it took me a long time to figure this out because I didn't expect it to be an error with the current template used with express.
Cheers!
Or would it be possible to introduce a sirv flag that forces decoding even if req.path is defined already?
Because I just noticed that I cannot manipulate req.path myself (readonly) to circumvent this...
There won't be a config, sorry. Customization is through the req.path
value – you've given it what it's supposed to use, and decoding the same value twice can be problematic. (Decoding is also expensive perf-wise.)
This isn't a Sapper issue either.
Express purposefully doesn't decode incoming paths because it's expensive and can lead to mismatch. They'd rather say that '/fu%CC%88nke.txt'
should 404 and '/fu%u0308nke.txt'
should 200 (or vice versa) even though they both would match /fünke.txt
file if decoded. They do, however, decode req.params
values (only).
It's only because you've named your file(s) in a non-standard way that you've run into this problem. Definitely recommend replacing spaces with hyphens or underscores in your file names. But if you don't/can't, here's how you can do it:
express()
.use((req, res, next) => {
req.path = decodeURIComponent(req.path);
next();
})
// ...
.use(
compression({ threshold: 0 }),
sirv('static', { dev }),
sapper.middleware()
)
Thanks for your advice, I understand.
But what you suggested does not work since req.path
is readonly.
I got it to work by wrapping the sirv call in another middleware though:
express()
.use(
compression({ threshold: 0 }),
(req, res, next) => {
sirv("static", { dev }).call(
null,
{ ...req, path: decodeURIComponent(req.path) },
res,
next
);
},
sapper.middleware()
)
So just one last question? Do you see any problem with this? To my knowledge the req
object is preserved while still being able to change path
only for further processing with sirv.
If not, I will suggest this approach for sapper users who want to use express.
Yes, that's fine. It's not pretty, but IMO it's what you should be forced to do to accommodate your non-standard file names.
My above example can be changed to this (I forgot req.path
is a getter)
express()
.use((req, res, next) => {
req.url = decodeURIComponent(req.url);
next();
})
// ...
Info for others dealing with this: The above code does not work either, only my snippet helped in my case so far.
Ran locally, this works with the static/foo bar.txt
example presented in linked issue:
const express = require('express');
const sirv = require('sirv');
express()
.use((req, res, next) => {
req.url = decodeURIComponent(req.url);
if (req._parsedUrl) {
var nxt = decodeURIComponent(req._parsedUrl.pathname);
req._parsedUrl.path = req._parsedUrl.pathname = nxt;
req._parsedUrl._raw = req.url;
}
next();
})
.use(
sirv('static', { dev: true })
)
.listen(3000)
Have to trick parseurl
to not only work with the new req.url
(which is fine, acceptable) but you have to modify its internal state/cache to send back the value you want.
FWIW, Express calls parseurl
6 times before the first bit of user code runs.
I do think this is a bug in sirv
. The req.path
should never be decoded. The fact that Polka is decoding it is a bug in Polka (https://github.com/lukeed/polka/issues/142). sirv
is matching Polka's incorrect behavior rather than Express's correct behavior. Soon sirv
still stop working with Polka as well once https://github.com/lukeed/polka/pull/172 is merged
Solved by the Polka PR and a7bd6722321ba55377f785efdf9b72083cec7968
The req._decoded
check I just added should have always been in there, since it was sirv's way of preventing duplicate decodeURIComponent
calls... but that was only true when it receives a request from a polka@next
app, since Polka was previously writing the decoded value to req.path
Now that the latest polka@next
(and Express) doesn't decode automatically anymore, req.path
isn't trustworthy on its own. It needs req._decoded
to be there too in order to trust it.
This combo-check is backwards compatible for polka@next
users who don't upgrade and will unblock Express users for the first time, who have always had a "raw" req.path
value set.
Just a short heads-up: Using the current sapper setup URLs containing spaces (%20) stopped working in conjunction with express when I updated to 1.0.0. Sirv gives me a 404 if I try to access any such URL (I don't know if this extends to other special characters, too).
For repro, I am using express@4.17.1.
I am sorry if this is a problem with express but I did some testing and I think chore(sirv): bump “@polka/url” for better decoding introduced the problem. (Might as well be an issue with @polka/url 🤔 )