jfhbrook / node-ecstatic

A static file server middleware that works with core http, express or on the CLI!
https://github.com/jfhbrook/node-ecstatic
MIT License
975 stars 194 forks source link

Symbolic links can cause incorrect response headers #168

Open battlesnake opened 8 years ago

battlesnake commented 8 years ago

I logged this on http-server, but I'm guessing the actual response generation is done in ecstatic so I'll cross post:

https://github.com/indexzero/http-server/issues/213


To reproduce

echo '<!doctype html><html><head><meta charset="utf-8"><title>Test</title></head><body>Hello world</body></html>' > test.html
ln -s test.html test
http-server

Then go to http://localhost:8080/test

Expected behaviour

Server resolves target (realpath?) then prepares HTTP response based on target file

Content-Type: text/html

Observed behaviour

Server does not resolve target, causing incorrect HTTP response

Content-Type: application/octet-stream; charset=utf-8
jfhbrook commented 8 years ago

It's most likely generating the content-type based on the extension of 'test' and not 'test.html'. Which seems fine to me.

battlesnake commented 8 years ago

Surely it makes sense for the content-type to depend on the file being served, rather than the URL used to access it? (since we do indeed serve the target file, not simply the contents of the symlink which would just be "test.html")

jfhbrook commented 8 years ago

How would you propose that I detect file type, then? The current implementation uses the extension of the thing you're trying to read (in this case the symlink).

battlesnake commented 8 years ago

Resolve the path to the file before analysing the extension, possibly using fs.realpath

https://nodejs.org/docs/latest/api/fs.html#fs_fs_realpath_path_cache_callback

jfhbrook commented 8 years ago

Why is that better?

jfhbrook commented 8 years ago

To solve your specific use case, try moving the file to test/index.html and turn on autoindexing.

jfhbrook commented 8 years ago

I believe there's also a flag to default an extension if one is missing, this would also meet your use case

battlesnake commented 8 years ago

Why is that better?

Because the format specified in the response is based on the file being served, rather than the path used to access it.

jfhbrook commented 8 years ago

I'm not entirely convinced that this is self-evident.

battlesnake commented 8 years ago

UNIX traditionally used the first two bytes of a file's contents to identify it (which filesystems usually still store a copy of in the directory entry), but filename extensions are somewhat more widely used (e.g. in http-server). The file's format isn't any different when accessed via a symbolic link, any more than the first bytes of it are.

mk-pmb commented 8 years ago

Because the format specified in the response is based on the file being served, rather than the path used to access it.

That "is" is a matter of configuration. I find it rather useful for some courses to have an example.svg.txt symlink to example.svg and have my Apache serve a text or an image depending on the path used to access it. Used in this way, I wouldn't consider my Apache's response headers as "incorrect", and I expect to be able to pull the same trick with ecstatic.

UNIX traditionally used the first two bytes of a file's contents to identify it

Sounds like a light-weight version of MIME magic. For that, see (and solve?) #66 .

BigBlueHat commented 7 years ago

Just tested the fs.realpath option and it seems to work:

$ ln -s index.html test
$  node -e 'var fs = require("fs"); console.log(require("mime").lookup(fs.realpathSync("test")));'
text/html

Sadly, that's not a fix for Windows folks...but then...they probably know that already. 😏

mk-pmb commented 7 years ago

There cannot be a fix because it's a config issue. If a chain of one or more symlinks is involved and you want content type to be guessed by filename, you'll have to tell your webserver which end of the symlink chain to use for guessing.

I gave the SVG as text example above, and I'll add some more use cases:

If you want any file system lookup, e.g. to resolve the symlink target, it should be async. (Thus fs.realpathSync is a really bad idea; for explanation why, please ask in the general node help.) My request for async mime type lookup is in issue #66. If you solve that one, I'm sure we can have some guessMimeFromFinalSymlinkTarget option soon after because it will be trivial then.

dotnetCarpenter commented 7 years ago

UNIX traditionally used the first two bytes of a file's contents to identify it (which filesystems usually still store a copy of in the directory entry), but filename extensions are somewhat more widely used (e.g. in http-server). The file's format isn't any different when accessed via a symbolic link, any more than the first bytes of it are.

@battlesnake ecstatic does already detect gzip'ed file based on the first two bytes of a file. You could expand that to cover all file types. But then you should ditch the mime package, since it uses the Apache project file extension tech.

But @mk-pmb has a good point about changing mime-type based on file-extensions via symlink.

In the end you have to choose if you want file content or file path to dictate the mime-type. Of course ecstatic got you covered with extensible custom mime types based on file path. So you can already create a config that get you to where you want.. more or less.