Revise Container API - Githubissues

back2dos commented 8 years ago

As @kevinresol pointed out in #17 there's no way to find out when the server closes. The truth is, I omitted this, because I had no good idea how to properly express it :D

Generally, the whole Container/Application API is a bit crude, to say the least. For example, nothing prevents you from running multiple applications on the same container and what not. Also, in "cgi-like" environments, you don't get to react to the server closing (which is why chose not to expose that at all).

My primary focus lies on a decent API of requests / responses and good implementations of the protocol itself, which is why I have neglected Container so far. But we should definitely deal with it. Maybe @benmerckx also has some input on that subject?

I think I would prefer to get rid of Application altogether, as it doesn't compose well (it's cumbersome to plug together two apps currently - further below I will illustrate how I feel it should be). So something like this might be good:

interface Container {
  function run(handler:Handler):Future<ContainerResult>;
}

typedef Handler = {
  IncomingRequest->Future<OutgoingResponse>;
}

enum ContainerResult {
  Failed(e:Error);//failed to bind port, multiple apps on the same container, or whatever
  Running(running:RunningState);//for node, tink_tcp, ...
  Done;//for cgi-like stuff
}

interface RunningState {
  function shutdown():Future<Noise>;
  var failures:Signal<ResponseFailure>;
}

typedef ResponseFailure = {
  request: IncomingRequest,
  response: IncomingResponse,
  error: Error,
}

It's a bit fatter than the status quo, but I think it does model the different scenarios somewhat closer.

Building on the above, this is what I mean by easier composition:

typedef Condition = IncomingRequest->Future<Bool>;

function combine(condition:Condition, consequence:Handler, alternative:Handler):Handler
  return function (req:IncomingRequest) 
    return condition(req).flatMap(
      function (applies) return 
        if (applies) consequence(req)
        else alternative(req)
    )

function route(?method:Method->Bool, prefix:String):Condition
  return function (req:IncomingRequest) return Future.sync(
     (method == null || method(req.header.method)) 
       && 
     req.header.url.startsWith(prefix)
  );

function route(prefix:String)
  return rule(function (m) return m == GET, prefix);

function host(expected:String):Rule
  return function (req:IncomingRequest) return Future.sync(
    switch req.header.get('host') {
      case [found] if (found == expected): true;
      default: false;
    }
  );

//here's some routing:
combine(
  host('api.myapp.com'),
  apiHandler,
  combine(
    get('/static/'),
    serveStatic,
    webHandler
  )
);
//or why not middle wares?
typedef Preprocessor = IncomingRequest->Future<IncomingRequest>;
typedef Postprocessor = OutgoingResponse->Future<OutgoingResponse>;

function preprocess(p:Preprocessor, h:Handler):Handler 
  return function (req) return p(req).flatMap(h);

function postprocess(p:Postprocessor, h:Handler):Handler 
  return function (req) return h(req).flatMap(p);

//and use them here

This is just an example of how nicely handlers could potentially be strung together, to create complex apps. Not sure the code even compiles and I don't mean to discuss it's specifics, only to illustrate what could be won by having composable handlers, rather than the bulky Application. Also, it seems a good idea to separate life cycle management for request handling. But I probably digress.

Anyway, my main point is, the current API needs revision anyway and I would like to replace it by something better, that does not just deal with the particular issue @kevinresol raised, but is something we can commit to.

So please pour all your thoughts here!

kevinresol commented 8 years ago

Yeah, the container change sounds good to me.

But I just can't wrap my head around it when a streamed body meets chained handlers or pre/postprocessors, because if one handler exhausted the body the next handler won't be able to read it anymore. How does that work actually?

back2dos commented 8 years ago

@kevinresol Well, such a pre/post-processor would construct a new body, which is basically what you would do for compression anyway. Or you can buffer the response and do stuff like this:

function cache(h:Handler):Handler 
  return function(req) return 
    if (req.method != GET) h(req);
    else Future.async(function (cb) {
      h(req).handle(function (res) {
        res.body.all().handle(function (o) cb(switch o {
           case Success(bytes):
              var etag = haxe.crypto.Md5.encode(bytes);
              switch req.header.get('etag') {
                case [same] if (same == etag):
                  new OutgoingResponse(
                    new OutgoingResponseHeader(304, "Not Modified", []), 
                    Empty.instance
                  );
                default:
                  new OutgoingResponse(
                    new OutgoingResponseHeader(
                      res.header.statusCode, 
                      res.header.reason,
                      res.header.fields.concat(new HeaderField('etag', etag))
                    ),
                    body
                  )
              }
              new OutgoingResponse(new OutgoingResponseHeader(
           case Failure(e): 
        }));
      });
    });
}

Which would add caching to any handler you want.

But yes, you can't just modify the body in place, if that's what you're asking?

kevinresol commented 8 years ago

Maybe I am still thinking in the "expressjs" way, that there will be a "body-parser" which reads and parses the body into a readily-consumable form, which can be used by subsequent handlers.

Maybe the "tink" way is quite different. There seems to be one and only one Handler would ultimately "consume" the body. So that we don't need a separate body parser, because the Handler should be responsible of parsing the body itself. Isn't it?

back2dos commented 8 years ago

You can do that in a vaguely "expressjs like" way, if you construct a new request. A problem here is that requests are not that easily extensible, because of static typing (e.g. there's no place for the parsed body in IncomingRequest).

Some options are:

Use static extensions on IncomingRequest for additional stuff.
Subclass IncomingRequest or replace or complement it. When in tink_http the basic Handler is IncomingRequest->Future<OutgoingRequest>, in some framework it could be ParsedRequest->Future<OutgoingRequest> and then the framework's job is to tranform IncomingRequest to ParsedRequest. You can look at monsoon's body parser here: https://github.com/benmerckx/monsoon#body
Do it in the handler, as you said.
Do it somewhere else. This is an approach I am pursuing in tink_web, where the routing generates the body parsing code.

But in any case, I think you will want to use a more faceted approach than express' one-size fits all solution. Unless someone here has a good idea how to make it work in a statically typed language? Otherwise I guess it will be up to frameworks to solve it on their own abstraction layer.

Still, static file handling, compression and possibly even caching could be provided in framework-agnostic libs, as none of them need to change the structural representation of requests or responses.

benmerckx commented 8 years ago

I'm all for the api changes. One thing I wonder is if this would also be compatible with some future implementation of sending chunked responses. But I don't know if and how you'd like to handle that.

I also feel an IncomingRequest should be as immutable as possible. Express adds all kinds of stuff on the request, which feels wrong, especially in a typed language. It requires a bit more thought but it should result in more fail-safe api.

Regarding the body parser in monsoon. The middleware is not exactly finished, the idea being middleware would always the passed if used in a previous method (so the parsed body string stays available, without having to parse the source again which would fail). I'm now unsure if I had completely finished that or not.

0b1kn00b commented 8 years ago

Chunked responses can be handled by writing an abstract over OutgoingResponse enumerated with a Stream, not a lot of code changes with a @:from.

The only way to sensibly deal with the polymorphism of requests in Haxe is to use dependency injection to simulate polymorphic dispatch by closing over a constructor function, and then triggering a thunk.

Check veins-di, it's as simple as can possibly be, I think.

Even the highfalutin code in haskell pipes winds up with a call to runEffect() once all the composition is done, so there isn't much to be gained by complecting routing on this level.

my 2¢.

kevinresol commented 8 years ago

@0b1kn00b agreed with the DI solution. @back2dos any updates for the new container api?

back2dos commented 8 years ago

Ok, there's some progress here. I still have the ModnekoContainer to actually implement.

There's a somewhat bigger change I wanted to consult about with you guys first. It seems that in PHP (and also mod_neko) there's no real way to get raw multipart requests. The files are always stored on the disk and you get the names.

I'm therefore inclined to change the body of IncomingRequest to something like:

enum IncomingRequestBody {
  Plain(source:Source);
  Multipart(parts:Stream<{ name:String, filename:String, content: Source }>);//or something like that ... will have to do some more reading
}

That would reflect the difference. For TcpContainer and NodeContainer you would always get plain sources (but the constructor might accept an argument to emulate the same behavior).

Thoughts anyone?

kevinresol commented 8 years ago

I am not sure if anyone will want to move the saved file around without reading it again. Maybe content is an enum of Source(source:Source); File(path:String). Then if it is a File, user can create the read stream from the path manually, or just copy it to somewhere.

back2dos commented 8 years ago

Yep, you are right. I hate this already, but I don't see what else we could do ... :D

On Tue, May 24, 2016 at 12:11 PM, Kevin Leung notifications@github.com wrote:

I am not sure if anyone will want to move the saved file around without reading it again. Maybe content is an enum of Source(source:Source); File(path:String). Then if it is a File and then user can create the read stream from the path manually, or just copy it to somewhere.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/haxetink/tink_http/issues/18#issuecomment-221225308

benmerckx commented 8 years ago

I don't really mind the IncomingRequestBody update, what is there to hate?

back2dos commented 8 years ago

Well, there's just so many predetermined cases to handle for the application, instead of letting the application establish what cases it wants to discern. I'm actually using tink_http for proxying, in which case anything that interprets the request body (in particular by streaming it to temporary files) is a nuisance.

But even without that, I think how the body is treated should be the application's responsibility. Then the application can first route the request according to the URL, retrieve the user from session cookies and then decide how much of what that user is allowed to stream to the server and how to handle the data. To have the user wait for a huge file to upload, only to have tell them that their session timed out, or permissions changed in the mean time or whatever, is rather silly. The container should hand of the request to the application ASAP.

All this automatic behavior is surely well-intended, but it just blurs the separation of concerns, potentially leading to quite negative effects. I had hoped to be able to opt out of it. The more I learn about HTTP the more I am delighted by many aspects of its design and the sadder I am to see how some of the most common interfaces for it sabotage that on a quest for convenience, that should really be pursued on framework/library level rather than baked into the lowest one programmers get to deal with. I think nodejs got this right, while all the CGI-derivatives made a crucial mistake. And what I "hate" is to have to carry that unfortunate decision into our interface too.

0b1kn00b commented 8 years ago

This might help solve the problem stx_simplex. It models a simplex, so you can either receive more input, or push to output at every iteration (and the dependent output handlers won't be called until you do) . check the enum

I've got a feeling this might need tail recursion to operate properly, but if you look at the haskell pipes mvc, this is the central piece, mapping an input stream to an output stream composably. I just got pipe working (I think)

benmerckx commented 8 years ago

Thanks for the explanation, I hadn't thought about the proxy use-case (I am using node-http-proxy in a project, should have a look at replacing that). Then again you wouldn't be able to proxy multipart requests through php/mod_neko anyway, or you'd have to reconstruct the whole body.

Am I right in thinking the body won't get parsed until you start using the parts stream (except for php/mod_neko where you have no control over it)? And that the parsing could happen at a later stage (even in a seperate lib maybe) but that would make things behave too different on php/mod_neko vs the rest?

back2dos commented 8 years ago

Indeed, the body will not be parsed at all, until the application decides to do it. For example if the user is not authenticated, then you might simply reject a POST-request without bothering to ever read the body. To me this seems far more flexible and more secure (assuming that the application author enforces sensible restrictions).

I would like to (at least potentially) have a veritable zoo of containers, including Twisted and Django and some plain WSGI for Python, a Netty based version for Java and what not (Justin's experiments with Lua in Nginx also interest me). In many of those cases the container will only give you the body in a preparsed way. So ultimately, I think we'll simply need to have a decent (general enough to cover most use cases, but still actually be usable) representation of parsed bodies. Reconsidering my impulse to make it a Stream, I think making it asynchronous is more trouble than it's worth though, because the container layer will almost certainly provide the data all at once.

An additional abstraction could be created to always deliver parsed bodies, but that's probably best to be solved in an extra lib.

back2dos commented 8 years ago

Ok, I think it's converging. There is still quite a bit of work to be done on the implementation side, but interface-wise I would propose the current state. Looking forward to your feedback ;)

back2dos commented 8 years ago

Well, seeing the lack of opposition, I think I'll just polish this a bit and fill at least some of the implementation holes.

back2dos commented 8 years ago

So IMHO this is now ok. If you have general problems with the changes, feel free to reopen this issue. Otherwise we can discuss more specific stuff in separate issues.

haxetink / tink_http

Revise Container API #18