gotham-rs / gotham

A flexible web framework that promotes stability, safety, security and speed.
https://gotham.rs
Other
2.23k stars 125 forks source link

Async static file serving #55

Closed tforgione closed 6 years ago

tforgione commented 6 years ago

On your blog, you say that some important features are on the roadmap, including async static file serving. Is there any work on that for the moment ?

bradleybeddoes commented 6 years ago

There hasn't been any work put towards this yet. The core team is busily working on 0.2 and external projects. Static files aren't something we had slated for 0.2 at this stage.

That said there wasn't yet a tracking issue for static files so lets use this issue as exactly that.

richardanaya commented 6 years ago

I'd like to take this on over the break. I have a project that needs this functionality and directory support :) minus well contribute.

bradleybeddoes commented 6 years ago

Welcome to the Gotham project @richardanaya, awesome to hear you're keen to take this issue on. In addition to comments on this issue, please feel free to stop by our Gitter room if you need help with the codebase.

One of the aspects of Gotham that we're very much focused on is security, we want to make it hard for web application developers to create something insecure by always defaulting to "the right thing".

A significant trap that I've seen other frameworks (Rust and non Rust) fall into when dealing with static files is exposing path traversal attack vectors. OWASP has some good documentation on what this is if you've not come across it before. Not getting this right scares the crap out of me so it will need some focus.

On the Rust side the module std::path will be useful here but it alone will not protect against path traversal attacks.

Good luck with the code, I look forward to using this!

bradleybeddoes commented 6 years ago

Just thought I'd check in with you @richardanaya, see how you're going on this one.

scottlamb commented 6 years ago

I have a couple crates that may be of interest in this repository:

Now, the caveats:

You'd be more than welcome to use my work if you find it a helpful starting point.

richardanaya commented 6 years ago

I'm afraid I may have to bow out of this one. I may be a little to new to rust to take this on securely.

On Thu, Jan 4, 2018 at 9:20 PM, Scott Lamb notifications@github.com wrote:

I have a couple crates that may be of interest in this repository https://github.com/scottlamb/http-entity:

  • http-entity supports doing conditional GET/HEAD and byte range serving on arbitrary "HTTP entities" (a simple trait you implement that supplies a given byte range, etag, last modified, other headers). It's like golang's http.ServeContent https://golang.org/pkg/net/http/#ServeContent: it takes care of a lot of the HTTP RFC details for you. Your trait can be based on anythingβ€”such as a static file, a byte array baked into the binary, a user-level network filesystem, a memcache layer on top of something else, or (my original use case) a .mp4 file assembled on-the-fly from smaller segments of video. It's quite flexible on the type of body, too; I wanted to ensure it supports zero-copy mmap serving as mentioned in this hyper issue https://github.com/hyperium/hyper/issues/953#issuecomment-278541328.
  • http-file uses that to serve static files, reading from the local filesystem in a separate thread pool to not block the tokio event loop.

Now, the caveats:

  • These are unreleased crates. (No particular reason they're unsuitable for release, other than needing a pass over the doc, a simpler example, maybe a continuous integration setup, and to finalize the crate name. I believe they work correctly and have reasonable test coverage.)
  • They're currently based on the hyper 0.11.x API, not the new http crate or gotham's API. I haven't looked into gotham at all myself yet (other than skimming the issues, thus finding this one).
  • http-file just serves a file that you give it. It doesn't do directory listings, path traversal, any of that.

You'd be more than welcome to use my work if you find it a helpful starting point.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gotham-rs/gotham/issues/55#issuecomment-355475343, or mute the thread https://github.com/notifications/unsubscribe-auth/AAR8mk4ukHNY-tuhg1usFqdyD_fXq8Xmks5tHbEFgaJpZM4Ppdjz .

bradleybeddoes commented 6 years ago

Hey @scottlamb, I haven't had a chance to checkout your crates yet but I will in the next few days, sounds like a good chunk of the work we need though from the way you've described here πŸ‘

vitiral commented 6 years ago

I just want to chime in that supporting static files (or some other way of serving up index.html with css/ etc) is the major blocker for me using this library with artifact.

I'm not sure if I'll be able to take a stab at this, but I would love to use Gotham for my app!

mike-engel commented 6 years ago

Is there an idea of what an ideal api would look like for this? I'm coming from a node/express.js background, and their static serving looks like so. This will provide an api under the path /access-path, which will resolve files in a folder located in path/to/static/root. e.g. localhost:7878/access-path/logo.png would serve ./path/to/static/root/logo.png.

const options = {}

app.use('/access-path', express.static('path/to/static/root', options)

Since I'm coming from this api, this is what I'm used to and it feels pretty good and straightforward. Depending on what the API y'all want, I may also take a stab at this. I may also wait until 0.2 is released if the gotham api is changing a bit (like routing is).

bradleybeddoes commented 6 years ago

@mike-engel I like how simply that reads and simple is good. I believe it would fit quite well with the general feel of the new router builder.

Stealing the router from one of the routing example it could end up looking something like (assuming bike shedding on to_filesystem at some future time):

fn router() -> Router {
    build_simple_router(|route| {
        route.get("/").to(index);
        route.get_or_head("/products").to(products::index);

        route.get("/assets").to_filesystem("/path/to/static/content")

        route.scope("/checkout", |route| {
            route.get("/start").to(checkout::start);

            route.associate("/address", |assoc| {
                assoc.post().to(checkout::address::create);
                assoc.put().to(checkout::address::update);
                assoc.patch().to(checkout::address::update);
                assoc.delete().to(checkout::address::delete);
            });
        });
    })
}

I'm thinking that with the challenges this feature represents perhaps having a number of us all chipping in is the best way to make it happen.

cc @vitiral @scottlamb @smangelsdorf @colinbankier to gauge interest in working on this together.

mike-engel commented 6 years ago

I’m really liking how that looks (and also the v2 router)! Very simple, and I think it will cover most, if not all, cases for static files.

colinbankier commented 6 years ago

This maps nicely to what I know from elixir-land :+1: There are number of additional options worth considering supporting e.g. cache control (etags), gzip - can see https://hexdocs.pm/plug/Plug.Static.html as an example to see the list of options that provides.

mike-engel commented 6 years ago

Definitely! The express one has the same options as well, documented here: https://github.com/expressjs/serve-static/blob/master/README.md#options

smangelsdorf commented 6 years ago
        route.get("/assets").to_filesystem("/path/to/static/content")

Just thinking aloud, this might have to work like a delegated route so that the /assets prefix is correctly stripped off it and the path relative to /path/to/static/content can be properly computed. (Is there a better way to handle this with the internals of the Router?)

It could be handled by wrapping a static file handler in a delegated router with a single glob route. Maybe in that case the route should be spelled route.delegate("/assets").to_filesystem(..) for a bit of transparency about what's going on internally.

Absolutely makes sense to support all the static file bells and whistles we can support, at the last If-Modified-Since, etags, compression, ranges. This might end up being a couple of separate PRs to get everything working (i.e. start with basic file serving and add extra bits after that).

colinbankier commented 6 years ago

Just thought I'd note that I started playing around with an implementation for this. I struggled to get some thing like route.get("/assets").to_filesystem("/path/to/static/content") to work, but I did get something like route.to_filesystem("/assets", "/path/to/static/content") to work much more easily, passing the relevant pieces through to a custom handler. Any thoughts on that as the api?

The difficulty I had with the first version was that the handler needs both these pieces of information on creation - and by adding the to_filesystem method onto a SingleRouteBuilder, there seemed no way to get the "/assets" path that it had been created with. I don't know if #175 would have helped here. Also, the existing get seemed to make some assumptions that didn't quite work, so it was actually like route.assets("/assets").to_filesystem("/path/to/static/content") - or similar.

In the route.to_filesystem("/assets", "/path/to/static/content") version, you just have both pieces of information together, and can just setup the handler directly.

bradleybeddoes commented 6 years ago

One issue I can immediately see with that @colinbankier is that we rule out the ability to use RouterMatchers which are a very nice part of the overall API (I'm currently playing with that in #177 as well).

I wonder if #175 is exactly the solution needed here. Assuming a builder statement of:

route.get("/assets/*").to_filesystem("/path/to/static/content")

For the request path /assets/images/bradley.png the match on glob would be available in state as the vec ['images', 'bradley.png'] which could then be used with PathBuf to construct the full on disk path of /path/to/static/content/images/bradley.png.

You also mentioned that "the existing get seemed to make some assumptions that didn't quite work" so I could easily be overlooking something here. Happy to work that through.

colinbankier commented 6 years ago

Thanks @bradleybeddoes - looking at it closer, I agree #175 looks like it solves the challenges I ran into. Let's mark at that as a dependency. The other "assumptions that didn't quite work" were around the difference between route.get("/assets").to_filesystem... (as the initial suggestion) and route.get("/assets/*").to_filesystem.... Using the Glob Matcher and #175, I think it could work as you suggest.

bradleybeddoes commented 6 years ago

That's great @colinbankier, looking at #175 it seems like something we're going to be able to wrap up reasonably quickly.

There is an example in that issue that shows how to get the glob components already which is likely useful to you if you hadn't seen it already.

colinbankier commented 6 years ago

I was looking into how to deal with detecting the mime-type of static files and came across this: https://crates.io/crates/mime_guess It seems maintained and kept up to date with the mime crate from first glance. Thoughts on including another dependency vs maintaining similar code within gotham?

bradleybeddoes commented 6 years ago

Hey there @colinbankier I'm πŸ‘ to this. Straightforward crate, well tested, long term support. Would be a crime to not make use of it πŸ˜„.

Incidentally we're working with @abonander on #11 as well and looking into https://github.com/abonander/multipart-async.

abonander commented 6 years ago

mime_guess is actually used by Servo, though not in a very significant role. Still, I'm pretty proud of it for how simple it is: https://github.com/servo/servo/search?utf8=%E2%9C%93&q=mime_guess&type=

Looks like they're also sniffing media types using magic bytes, maybe to corroborate the return value from mime_guess: https://github.com/servo/servo/blob/master/components/net/mime_classifier.rs

For static file serving, guessing based on the extension is probably sufficient since they're your files anyway; if the extension is wrong then it's probably your fault πŸ˜† (and the client should be validating it anyway).

illicitonion commented 6 years ago

@colinbankier I picked up your branch, and applied this status quo of #175. https://github.com/illicitonion/gotham/commit/536316182c97229345b0c88bfa19c30703869b8d was about as simple as I could get it (this is just sync, not async, but I'm trying to firm up the API). What do y'all think of the example usage in https://github.com/illicitonion/gotham/commit/536316182c97229345b0c88bfa19c30703869b8d#diff-5418f93122552f9f26c45f250d7719f5 ?

bradleybeddoes commented 6 years ago

@illicitonion thanks for moving this along. I have only had a brief look so far but here is some initial changes I thought might be helpful to consider https://github.com/bradleybeddoes/gotham/commit/45e61f35285e0680f8d1fdd87776a887499a4e23

Essentially this makes use of the NewHandler and Handler types for this as we've done elsewhere. This might have some downsides I've not yet considered that perhaps you did?

colinbankier commented 6 years ago

Great thanks @illicitonion - looks like a good direction to me (with @bradleybeddoes' addition of Handler/NewHandler types). I had also included a normalize_path(path: &Path) -> PathBuf function (shamelessly 'borrowed' from Iron's static file handler) to prevent path traversals. I wasn't sure whether this was sufficient to handle this aspect of security, or whether there are more attack vectors I hadn't considered? Love to coordinate on the next steps here. I was using https://github.com/integer32llc/rust-playground/issues/269 as a rough guide for what to include at least initially.

shepmaster commented 6 years ago

to prevent path traversals

I'd double check what Rocket does, as well as what I did in https://github.com/integer32llc/playground-middleware/blob/master/src/staticfile.rs. I also strongly advocate for adding some for-real filesystem tests for the traversal checking.

colinbankier commented 6 years ago

Great, thanks for the link @shepmaster - and yep, I strongly agree on some tests around traversal checking.

illicitonion commented 6 years ago

Essentially this makes use of the NewHandler and Handler types for this as we've done elsewhere. This might have some downsides I've not yet considered that perhaps you did?

Looks great! No downsides I considered, I just haven't looked at enough Gotham code to know the pattern :)

I had also included a normalize_path(path: &Path) -> PathBuf function (shamelessly 'borrowed' from Iron's static file handler) to prevent path traversals. I wasn't sure whether this was sufficient to handle this aspect of security, or whether there are more attack vectors I hadn't considered?

Oops, I forgot to throw that in (I was copying and pasting into a new file, rather than deleting from the old) - looks great :)

Love to coordinate on the next steps here. I was using integer32llc/rust-playground#269 as a rough guide for what to include at least initially.

I think the next step is probably just to get something which functions in and run from there - after that, there are some extra features to add as per that ticket, but I think those can reasonably be added as separate PRs...

As for coordinating... I guess I'd suggest you get your code PRable (as you wrote the meat of all three of our versions) and send it out for review, and we can divy up some tickets for the follow-up features? If you'd like to go about it another way though (e.g. if you're short on time at the moment) I'm sure we can coordinate something else :)

My initial list of features which we should add in the future (I don't know where to draw the line between initial PR and follow-ups):

  1. Async serving, rather than always buffering everything in memory
  2. In-memory caching - I was thinking of two configuration axes: (Off, ByMtime, AssumeFilesNeverChange) and (Infinite, TimeLimited(Duration), SpaceLimited(usize))

(I'm happy to be as involved or not with this as is useful; I'm just excited to using the feature :))

bradleybeddoes commented 6 years ago

I'd really like to see both the stringent path traversal protection and "async" file serving in the initial feature set (by which I mean handling I/O on a thread pool unless something has changed I am not aware of wrt lack of support for non blocking I/O on most file systems).

Several PR to achieve that would be fine so long as they built on each other, preferably we don't have any period of time where something that is non async is merged into master.

Alternatively this may be one of those times where a larger initial PR makes the situation easier to reason about.

What a helpful comment this ended up not being πŸ€“.

colinbankier commented 6 years ago

Great - thanks @illicitonion - I'll try get a basic branch PRable with the current state of things. If you'd like to explore some of the next steps, e.g. async, then we can bundle these into a bigger PR worthy of merging when it has all the things we want for an initial merge. (I'm sure you'd have seen it - but just in case, @scottlamb had given a helpful starting point for async files above in this thread in his repo https://github.com/scottlamb/http-entity)

shepmaster commented 6 years ago

Since I hadn't seen it mentioned anywhere else in the thread, perhaps it would be worth looking at the futures-fs crate to see if it could be used instead of rewriting.

scottlamb commented 6 years ago

@scottlamb had given a helpful starting point for async files above in this thread in his repo https://github.com/scottlamb/http-entity

I'd even like it to be possible for gotham to use it directly. It unfortunately can't be used with the hyper::Body right now (you need to use something like Box<Stream<Item = hyper::Chunk, Error = hyper::Error> + Send> instead), but that's getting fixed with hyper 0.12.x (see hyperium/hyper#1468). I plan to upgrade http-entity (which I've renamed to http-serve btw) to use all the http crate types soon-ish. scottlamb/http-serve#2 was a step toward that.

What's the story with gotham and the http crate? I believe using it will be required to upgrade to (not-yet-released) hyper 0.12.x.

Since I hadn't seen it mentioned anywhere else in the thread, perhaps it would be worth looking at the futures-fs crate to see if it could be used instead of rewriting.

That crate doesn't support byte ranges ranges, so I didn't find it useful when writing http-serve.

bradleybeddoes commented 6 years ago

We're tracking http crate integration in #26 which has essentially also become the main ticket driving us towards Hyper 0.12.

We've not locked this in yet but there was some thinking that Gotham 0.3 would mostly focus on that change as it is likely to touch quite a few pieces.

colinbankier commented 6 years ago

Just wondering - if the root path for static files doesn't exist - should we panic on startup? Just log it and carry on?

shepmaster commented 6 years ago

if the root path for static files doesn't exist

I'd vote that it should return a HTTP 404 for each request. If the files come into play during the lifetime of the running process, then they start working. If they disappear, they go back to 404. Basically, nothing special should happen at startup, although a log message might be nice to help debug the "I mistyped the path somewhere".

abonander commented 6 years ago

Another question is whether or not to expose the case-sensitivity of the underlying filesystem.

colinbankier commented 6 years ago

One thing I also noted with the current glob routing of "/*" is that it won't handle things like serving an index.html if just the root path is requested. Are we happy with having to specify this explicitly / separately in the router? e.g. route.get("/").to_filesystem("/path/to/static/content/index.html")

bradleybeddoes commented 6 years ago

Another question is whether or not to expose the case-sensitivity of the underlying filesystem.

As a data point I believe Apache exposes case-sensitivity of the underlying filesystem (config directives not with withstanding). Perhaps that is a reasonable lead to follow?

That is we'd expose case-sensitivity of the underlying filesystem.

Does anyone have counter examples we should consider?

bradleybeddoes commented 6 years ago

it won't handle things like serving an index.html if just the root path is requested

I'd vote for the need to be explicit for now. If it becomes something folks have a real need for once static file support ships we could deal with it then.

shepmaster commented 6 years ago

folks have a real need for

I'll say that I would need it ;-)

illicitonion commented 6 years ago

it won't handle things like serving an index.html if just the root path is requested

I'd vote for the need to be explicit for now. If it becomes something folks have a real need for once static file support ships we could deal with it then.

There are at least three "correct" options here; no special handling (404/401), use a special file as the index, serve a directory listing. Much though I'm loathe to add configurability, we should probably make this configurable with some kind of builder.

(Maybe as a follow up, though :))

bradleybeddoes commented 6 years ago

I'll say that I would need it ;-)

Well that didn't take long! :).

Is it just serving index.html at the root path you need @shepmaster or something deeper?

shepmaster commented 6 years ago

Is it just serving index.html at the root path

Yes, that's all the playground needs. For my case, I also serve the file /index.html in response to /help, so it's possible that the same mechanism could be used for both of those.

serve a directory listing

Perhaps this could even use the same thing, completely separate from static files. Making up syntax:

// Establishes handlers for any request under the path `/`
route.get("/*").to_filesystem("/path/to/static/content");

// But these override a specific URL

// By serving a specific file for a URL, rewriting the request
route.get("/").as("/index.html");
route.get("/help").as("/index.html");

// Or you could do this for a listing
route.get("/").directory_index("/path/to/static/content");

To be clear, I'm still not actually using Gotham, so this is more of a "would be nice". I don't know that it needs to block anything, just that it's something I'd need.

illicitonion commented 6 years ago

Subdirectories slightly change the / question though; in general, if there are subdirectories, you probably want to do the same thing as the root dir (either a directory listing, or serve index.html, or whatever), which makes me feel it's probably something we want to configure on the filesystem handler, rather than the router...

colinbankier commented 6 years ago

this could even use the same thing, completely separate from static files. route.get("/").as("/index.html");

@shepmaster something separate and explicit like this should be simple to add (aside from as being a keyword :smile: ) - if everyone is ok with the ergonomics of having to be explicit about this.

As @illicitonion noted though, this brings up some inconsistencies between the root path and subdirectories - because / is not matched by /* in the router anything at / would have to be specified separately, while subdirectories could easily just serve default files etc at directory roots.

One of the earlier suggestions of something like

route.to_filesystem("/", "path/to/filesystem");

would allow the these to be handled consistently in the same handler, as it captures the root path. ("path/to/filesystem" here would likely be a StaticFileHandler rather than just a string, so you could configure it with appropriate options)

I'm fine with the root needing to be specified separately though. Happy to hear what people think gives better ergonomics. @bradleybeddoes ?

shepmaster commented 6 years ago

some inconsistencies between the root path and subdirectories

I'm certainly biased here. My specific use case has a filesystem layout like this:

build/
β”œβ”€β”€ assets
β”‚Β Β  β”œβ”€β”€ advanced-editor-01a0ad81683a5b9b9446.js
β”‚Β Β  β”œβ”€β”€ advanced-editor-0335ee0c22916abf3009.js
β”‚Β Β  β”œβ”€β”€ omg_there_are_so_many_files_here_because_hashes.seriously
β”‚Β Β  β”œβ”€β”€ vendors~app-ea0837ba80e092cc9035.js
β”‚Β Β  β”œβ”€β”€ vendors~app-f60ed1f950512db9891f.js
β”‚Β Β  β”œβ”€β”€ vendors~brace-keybinding-emacs-0a80acad51176252fa48.js
β”‚Β Β  └── vendors~brace-keybinding-vim-6b4f2f5111644e7d447c.js
β”œβ”€β”€ index.html
└── robots.txt

I only care to serve the index.html at the root, and would be fine with /assets/ returning an error. Now, I probably wouldn't put an index.html into assets, so if it looked for one and then returned a 404, it probably wouldn't be a negative for me.

onelson commented 6 years ago

I feel like I'd skip providing a solution for directory defaults/listings and set a plain route/handler for the index.html

onelson commented 6 years ago

Although, I do like @shepmaster's suggestion of having a shorthand like

route.get("/").from_file("path/to/index.html");

which would basically sniff the mime of the file and set the content type appropriately. That feels like the right amount of sugar to me.

colinbankier commented 6 years ago

Yep, I like the simplicity of

route.get("/").from_file("path/to/index.html");

(maybe to_file to be consistent with to_filesystem). No hidden magic around default files etc needed.

bradleybeddoes commented 6 years ago

πŸ‘ for

route.get("/").to_file("path/to/index.html");

and leaving directory listings alone for now.

colinbankier commented 6 years ago

I have a working branch here with the basics so far: https://github.com/gotham-rs/gotham/compare/master...colinbankier:static-files

A few things I'm not sure of in terms of gotham / rust style conventions and other things. Happy for much feedback :). Thanks @illicitonion for the starter on the glob matching bits.

No async yet unfortunately - @illicitonion did you happen to look further into this?