rwf2 / Rocket

A web framework for Rust.
https://rocket.rs
Other
23.59k stars 1.52k forks source link

Allow for optionally serving pre-compressed files. #2716

Open hcldan opened 4 months ago

hcldan commented 4 months ago

There wasn't a good way to propagate the new option into NamedFile so I made a new constructor and added a flag to the struct. In order to make the flag understandable, I converted to a struct with members instead of a simple tuple.

I am not sure how to test the Content-Type returned... It seems that the original tests didn't check those headers and there's not really a clean way to pass that in the assert functions without refactoring a bunch of stuff.

hcldan commented 4 months ago

One approach I tried first was to implement a wrapper around FileServer and implement a Handler that would defer to FileServer and adjust headers on the way out.

I could not figure out how to properly clone the Request in order to get a mutable value I could set_uri() on so I could get it to try to read the alternate file and check to see if it was found. And while something like that would have worked, if I found a way to modify the request, it's still rather kluge-y as the request wasn't for the gz file, it was for the non gz file and should be using the gz file to serve the request for the non-gz file with appropriate headers.

It makes it complicated to tamper with the request like that and have a proper response. I think it's better to make the resource-check by looking at the file system and choosing what file to send back according to the actual request.

hcldan commented 4 months ago

Fixes #2718

ApprenticeofEnder commented 4 months ago

Final question: Could we also perhaps have some way of providing file hashes with the server? That might be a separate PR, though it's a good thing to think about, especially with compressed files like this.

hcldan commented 4 months ago

Hashing the files would be nice for other reasons, like you say, but it would also introduce performance overhead. I think that's best left to another PR to implement features specific to hashing needs.

I also did not implement anything related to cache control headers, as there's a fairly long conversation in other issues and I've not seen much movement on it.

hcldan commented 3 months ago

@ApprenticeofEnder Is this something that the Rocket team would like to merge? How do I proceed here?

hcldan commented 3 months ago

@SergioBenitez Is this something that the Rocket team would like to merge? How do I proceed here?

ApprenticeofEnder commented 3 months ago

@ApprenticeofEnder Is this something that the Rocket team would like to merge? How do I proceed here?

Sorry, unfortunately I'm not on the Rocket team! Just thought I'd start some conversation surrounding your PR.

SergioBenitez commented 3 months ago

How do I proceed here?

Just as you are!

@SergioBenitez Is this something that the Rocket team would like to merge?

Potentially. It's an interesting idea. As I see it, here are the pros and cons:

The pros are:

In general I think I'm in favor of the idea with a different implementation. There are a few key things I would like to see changes.

  1. I don't think we need to make any changes to NamedFile, nor do I support making those changes. If we want a different responder, we define a different one. The impl can be easily derived, for instance, for a structure that includes the file with the appropriate fields or attributes for headers.
  2. As mentioned before, it seems far too simple to serve the incorrect file. I believe we need to add some sort of consistency checking to make this viable. At minimum, we should require that the gz file has a mod time greater than the original file. Ideally however, we would be able to check that the gz file does indeed correspond to the regular file. One approach, as an example, would be to require the gz file name to include a hash of the original file. That is, foo.HASH.gz, where HASH is the hash of foo. We would then compute the hash of the original file and compare the hashes, serving the gz file only if they match. Ideally we'd only need to do this once per file, caching the results of the check for the future.
  3. It feels that there is a more generalizable mechanism at play, one that can not only be used to implement this feature, but other existing features, such as index file serving. What I'm suggesting is effectively mod_rewrite, albeit a very constrained version. If we allow FileServer to be constructed with dynamic "rewrite" functions in which the user (or us internally) is able to tell us which file, if any, should be served for the path identified by the request, then this can be implemented outside of Rocket entirely. This effectively makes FileServer pluggable. We would seek an API by which we can implement most if not all options currently in FileServer. Ideally, the API would also make composition and precedence possible and obvious. For instance, should foo/index.html.gz be served for foo/ if both the Index and Compressed options are enabled? Ideally we'd have an API that would allow the user to create a FileServer that can answer the question in either direction.

If you choose to proceed, I would first like to see a sketch of an API for 3 that would handle most or all of the presently available options. The "sketch" should show that each option for FileServer could be implemented, including the one in question, and how they compose. Ideally the implementation for each option would be rather straightforward and consist only of the bare-minimal logic necessary. For example, the implementation of the Index option should be something as close to as simple as:

path.dir().map(|dir| dir.join("index.html"))

The DotFiles option as simple as:

(allow_dotfiles || !path.starts_with('.')).then(path)

Then if each of these rewrites is nameable in some way, the default FileServer becomes a specific composition of these rewrites:

FileServer::new()
    .rewrite(Jail("/foo")) # rewrite /req/path to /foo/req/path.
    .rewrite(DotFiles(false)) # do not allow DotFiles
    .rewrite(IndexFile) # rewrite /foo or /foo/ to /foo/index.html

And new ones can be added:

FileServer::new()
    .rewrite(Jail("/foo"))
    .rewrite(DotFiles(true))
    .rewrite(IndexFile)
    .rewrite(PreCompressed)

Note how the previous ambiguity regarding ordering is removed and both options (check for compressed before or after index.html rewriting) are easily achievable.

I would be in favor of such a clean-up PR.

hcldan commented 3 months ago

Some comments about your proposals:

I think I can come up with something that doesn't change NamedFile, though I do think there will be some duplication of code, because the content-type for a gzipped file should be of the non-gzipped version. Easily doable, but slightly duplicated.


For 2, serving pre-gzipped files is not something new and the other implementations I've seen don't even require the non-gzipped file to exist. (side note: gzip default invocation leaves only a .gz file and no source file). I think I would prefer to avoid over-complicating this.

As you say, the file server should just be serving files. Adding a check for file modification time would be an interesting feature if the server were to handle gzipping non zipped content automatically and then re-generating it if the underlying file has been modified. That is an interesting capability, but one that is out of scope for my needs, and I would argue the needs of the majority. I would rather see additions like that as future pull requests once a viable base feature is delivered.


For 3, the api you list is interesting. In many ways, I wish this was already the case as it would have made this much easier to do. Alas, I do not have time for a rewrite of a crate I'm only just becoming familiar with. I would like to see this feature make a release soon and this sounds like it might require a major version bump because the api is changing and the options are no longer provided the same way.

In any case, I would like to focus on a minimum viable change here so it can be delivered and leveraged. Later, either I can or someone else with more familiarity with the crate can tackle 3.

Also, 3 seems to dovetail nicely with the other issue opened for lastmodified/etag/arbitrary headers. I am very reluctant to cross streams here, this could drastically expand the scope of what I had hoped to accomplish.

hcldan commented 3 months ago

@SergioBenitez I removed changes to NamedFile which should address (1)

See my response above for (2) and (3)