Split staticfile into two APIs

untitaker commented 8 years ago

Currently staticfile consists of two main features:

Given a filepath, serve a file with proper etag headers, handle if-match, range requests, etc. automatically (no idea if all of this is actually implemented)
Given a folder, map all requests to a file within that folder

Currently staticfile exposes only 2., I'd like it to expose 1. as completely independent feature, for cases where I want to write the logic that maps from request URL to filepath myself.

Flask has a send_file function, which takes an absolute filepath (and implicitly a request) and gives back a response. Here's my API proposal for Iron:

send_file<P: AsRef<Path>>(&Request, P) -> IronResult<Response>

Perhaps returning a IronResult<T> where T: Modifier` is more consistent with the rest of core-Iron's API, but I seriously doubt the usefulness of that.

Hoverbear commented 8 years ago

Sounds reasonable to me.

scottlamb commented 8 years ago

I'd love to also have a layer 0. which handles GET and HEAD serving (including conditional GETs and range requests) on top of an abstract "entity" (HTTP's term) without involving the filesystem.

Go provides something like this; see http.ServeContent.

I needed 0. for my security camera NVR project, which dynamically creates .mp4 files based on its SQLite3 database and segments of on-disk files. I have some apparently-working, automatically tested code I'd be happy to share if desired. There are probably some cases in the RFC that I didn't implement correctly after my quick skim, and I haven't even tried to do some others yet. But currently my interface is roughly this:

/// A static HTTP entity for GET and HEAD serving.
pub trait Entity<Error> where Error: From<io::Error> {
    /// Returns the length of the entity in bytes.
    fn len(&self) -> u64;

    /// Writes bytes within this entity indicated by `range` to `out`.
    fn write_to(&self, range: Range<u64>, out: &mut io::Write) -> Result<(), Error>;

    fn content_type(&self) -> mime::Mime;
    fn etag(&self) -> Option<&header::EntityTag>;
    fn last_modified(&self) -> &header::HttpDate;
}

/// Serves GET and HEAD requests for a given byte-ranged entity.
/// Handles conditional & subrange requests.
///
/// TODO: does it make sense for the caller to add Expires, Cache-Control, and Vary headers first?
/// or do they need to be added conditionally based on the response type we serve?
pub fn serve<Error>(entity: &Entity<Error>, req: &Request, mut res: Response<Fresh>)
                    -> Result<(), Error> where Error: From<io::Error> { ... }

My code directly uses hyper today, not iron. But if iron provided 0., that'd be a compelling reason for projects such as mine to use it. (There are probably other things I'll need that iron already provides, but my code's just a prototype now, I'm just learning the language, and I started learning by doing things with the lower layer.)

Also, my code is using the deprecated hyper 0.9.x synchronous interface rather than the not-ready-yet hyper 0.10.x async master or tokio interfaces. I'm looking forward to changing it once something else is stabilized; giving it chunks of preassembled Vec<u8> would be better for my code.

untitaker commented 8 years ago

@scottlamb You can pass arbitrary T: Read to Response::with, since they all implement Modifier. You need to set Content-Type manually, but the response body already gets omitted automatically in the case of a HEAD request.

scottlamb commented 8 years ago

Oh. That's totally wrong for my use case, actually, it introduces another copy here:

impl <R: io::Read + Send> WriteBody for BodyReader<R> {
    fn write_body(&mut self, res: &mut ResponseBody) -> io::Result<()> {
        io::copy(&mut self.0, res).map(|_| ())
    }
}

I'm hoping a future version of hyper will allow me to just hand over chunks of &[u8] slices with ownership promised in some way; I was just making a proposal for that on hyper issue 934.

And I'm hoping iron will provide a layer above hyper's (current or future) interface that handles conditional GET, range serving, and whatnot without forcing me to write stuff out to the filesystem first (which I absolutely cannot afford to do; I'm generating huge files from small pieces on demand).

untitaker commented 8 years ago

This is not copying to a new section in RAM or any storage medium on the server, it is writing to the output stream.

scottlamb commented 8 years ago

It takes my &[u8], copies it into an internal buffer within io::copy, and then copies it into hyper's output buffer. There are two copies, or one more than is necessary with the interface that hyper currently provides. Here's the definition of std::io::copy:

#[stable(feature = "rust1", since = "1.0.0")]
pub fn copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W) -> io::Result<u64>
    where R: Read, W: Write
{
    let mut buf = [0; super::DEFAULT_BUF_SIZE];
    let mut written = 0;
    loop {
        let len = match reader.read(&mut buf) {
            Ok(0) => return Ok(written),
            Ok(len) => len,
            Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        };
        writer.write_all(&buf[..len])?;
        written += len as u64;
    }
}

untitaker commented 8 years ago

If performance is that much more important than convenience, why not override response.body with your own struct that impls WriteBody?

scottlamb commented 8 years ago

Why should it be necessary to choose between the two? The serve interface I showed above provides both. I was thinking this was generic code within iron's scope of responsibility, but if you disagree I'll just do it in a separate crate.

untitaker commented 8 years ago

I'm just saying that Iron already provides the necessary low-level access to Hypers APIs for your project. One variant (using a T: Read as Modifier) is a shortcut for the other (manually setting the response body), and IMO covers a majority of the usecases with very little boilerplate and very little complexity in Iron as well.

I'm not opposed to your abstraction at all, at the same time I would strongly prefer shortcuts like mentioned in the OP for the majority of usecases. Either way I think those things shouldn't be part of Iron core for the simple reason that they don't need to be. Particularly etag and last-modified semantics pull in dependencies that minimalist API servers wouldn't want to pay the cost for

scottlamb commented 8 years ago

Ahh. Yes, I think you're right that I could implement serve on top of iron instead of directly on top of hyper. I did the latter in part because I wanted to start at the bottom as I learned the pieces, and in part because iron didn't add anything I needed at the moment. If iron had provided this for me, I would have happily taken the extra dependency right away...

When you say "shouldn't be a part of iron core", do you mean it shouldn't be a part of crate "iron"? I don't feel strongly that it should be. I'd be happy with in "iron", in "staticfile", or in a new crate between. I do think that staticfile should be built on top of it—it doesn't make sense to provide static file serving without conditional GET and range handling, and it doesn't make sense to have two implementations of that. But if you disagree, I'll just go do my own thing.

I don't agree with what you're saying about etag and last-modified pulling in dependencies. The Entity trait definition and serve function have no dependencies beyond hyper now. Some Entity implementations may require cryptographic functions for etag and such but others could just return None. There's useful, non-trivial code that is dependency-free. And the staticfile implementation could provide both etag and last-modified based on the result of the stat syscall; this is what Apache, nginx, etc. do.

untitaker commented 8 years ago

If you want to build a high-level abstraction around Last-Modified you're probably going to depend on the time crate.

scottlamb commented 8 years ago

That's true. Currently my serve implementation doesn't depend on the time crate (only hyper), but I suppose I should be setting Date myself and ensuring Last-Modified is consistent with it rather than relying on hyper setting Date later on; as is I'm not compliant with RFC 2616 section 14.29:

   An origin server MUST NOT send a Last-Modified date which is later
   than the server's time of message origination. In such cases, where
   the resource's last modification would indicate some time in the
   future, the server MUST replace that date with the message
   origination date.

But why is this a problem? hyper depends on time. So what problem does it create for iron to also depend on time?

untitaker commented 8 years ago

Ah, I was assuming there's an extra cost because staticfile has everything related to that functionality within an optional feature. It appears https://github.com/alexcrichton/filetime would come as a new dependency. I don't think that's necessary for the abstraction, but probably for the impls that should IMO be included.

untitaker commented 8 years ago

Anyway at this point I'd like to see a PoC pull request or new crate that does what you want to do. We can then move that into staticfile and expose the new API as well.

scottlamb commented 8 years ago

Sounds good. I will start polishing my code and adapting it to iron. (At my usual work-a-little-bit-every-night-after-my-kid-goes-to-bed pace...)

scottlamb commented 7 years ago

Quick note: I still haven't adapted my code to the Iron interface (I'm slow and trying to do a lot of things with my project at once), but I have at least extracted it into a separate crate and published it to github as scottlamb/http-entity, so you can look at the current code if you're curious.

iron / staticfile

Split staticfile into two APIs #87