Implement Body extractor

bradleybeddoes commented 7 years ago

Must support:

application/x-www-form-urlencoded
multipart/form-data
text/plain (?)

Per other extractors for Request path and Query string, we've already shipped this must be type safe and support optional fields.

abonander commented 7 years ago

I've been working on multipart-async for a bit now but I'm not sure what I want the final API to look like yet. I'm open to collaboration!

heronhaye commented 7 years ago

+1. Data validation is central in Rocket and very powerful (https://rocket.rs/guide/requests/#field-validation), works for form data, query strings, JSON, etc.

bradleybeddoes commented 7 years ago

Hey @abonander sorry it has taken me a few days to get back to you, very hectic this end.

I certainly would like to get your help in figuring this out for Gotham as it seems you've already gotten quite a ways down the path of implementing this with Hyper natively.

I'm of the opinion (and @smangelsdorf may disagree!) that this kind of functionality is important enough that we need to have a solution for it in Gotham core and then support it long term.

Did you happen to get a chance to look at the Gotham API yet? I'm wondering how well you think our extractors concept might fit with what you've already done, especially with how we do proc_macro_derives for request path and query string structs that are defined by the application using Gotham.

bradleybeddoes commented 7 years ago

Hi @modalduality, I agree that validation is really important and Extractors can, at least partially, do this.

Right now an Extractor makes sure that Request data is type valid for the path and query string. So if your app is expecting to get a u8 but the Request presents anything else the Request is aborted with a 400 Bad Request, assuming you derive StaticResponseExtender. You can easily customize the specific Response with your own StaticResponseExtender implementation instead of deriving.

If you have a custom type you could implement FromRequestPath or FromQueryString and make any validations you need during the conversion process, simply creating an Err if something isn't right.

However this discussion probably highlights something that would be quite useful for Extractors, an extension to PathExtractor and QueryStringExtractor along the lines of fn validate(state: &State) -> Result<(), Err> that the Router would call if fn extract is successful. This would allow the application to do further validation if it so desired, e.g. ensuring that a String value which is type valid also meets some regexp or a u8 has been provided within some desired range.

Thoughts?

abonander commented 7 years ago

@bradleybeddoes I like the concept of extractors, but I was talking a bit lower-level. At its core, multipart-async just parses the request body into sections which are still essentially futures::Stream<Item = Vec<u8>>, so it would be necessary to either collect those into a datastructure before touching user code, or create an extractor trait which can return a Future.

If collecting fields to a datastructure, we're going to have to touch the filesystem because fields in a multipart request can come from files and possibly be too big to fit into memory. In multipart (the original implementation for synchronous APIs), I created an abstraction which can save all files in a given request to a temp directory so they can be accessed arbitrarily.

Because the only really supported file operations on any given OS are blocking (there have been some attempts at asynchronous disk I/O but they're not really satisfactory), recreating this abstraction for asynchronous requests would require a pool of background worker threads. I want to keep multipart-async simple but I was thinking of building a separate crate which provides cross-platform asynchronous disk I/O using background worker threads, built around the futures::Stream abstraction. It could perform various tricks to increase throughput, like mem-mapping and reading single aligned blocks at a time.

The question becomes: how much of this do we want to expose to the user? Do we want to expose files as Stream<Item = Vec<u8>> or give them the form-data as a map of strings and tempfiles (which is, from what I've seen, somewhat common in web stacks)?

abonander commented 6 years ago

@bradleybeddoes congrats on the 0.2 release. I see this issue is on the 0.3 milestone now, have you given any thought to my previous comment?

I'd be happy to work with integrating multipart-async, I just need to know what direction you want to take with it.

bradleybeddoes commented 6 years ago

@abonander, thanks!. We have and we're certainly very keen to work with you.

I've run out of time and brain power today but I have half a response formulated in my head already, I'll get something more substantial to you in the next day or so on this so we can get moving 👍.

bradleybeddoes commented 6 years ago

Hi again @abonander, thanks for sticking with us, busy times.

So @smangelsdorf and I have thrown this around a little bit offline and whilst we have a pretty reasonable idea on what application/x-www-form-urlencoded will look like, which is essentially the same as what path/query extractors are today just applied to the body, we're unsure how multipart/form-data should integrate.

Which usually means it is time to experiment 🎉 🤓 👍.

We could:

Get application/x-www-form-urlencoded support into some sort of usable state first and then look at multipart or;
Jump in, and hack on the Router to see what can be achieved with multipart and worry about unifying the implementations later on. This could lead to a dead end of course but we'd at least learn a lot.

You mentioned you might have some time to start looking into this, does 1 or 2 seem like a better approach for you?

We also briefly discussed if adding a dependency on the multipart-async crate is a good idea, purely around long term maintenance/API stability. Nothing we need to decide right now but something to chat about in the future in terms of goals/plans/support etc for the crate.

abonander commented 6 years ago

@bradleybeddoes It all depends on whether or not you want to treat urlencoded and multipart forms with the same interface. I haven't been exposed to a large number of web frameworks (except PHP 5 and the Yii framework, and I can't say that they set great examples) so I don't really know what the precedents are.

If you expect an endpoint to only accept multipart uploads then a router-based solution works okay; I've provided some integrations to this end in the synchronous multipart crate for Hyper and Iron.

abonander commented 6 years ago

Express.js appears to do the middleware thing, recommending multer as a bodyparser. Multer integrates some basic form validation, expecting specific field names and types and only saving that much. I don't know if it throws away unexpected fields or returns an error though.

abonander commented 6 years ago

@bradleybeddoes I don't have a working prototype yet but I'm conceiving of a derive-based solution to handing form input that would cover both x-www-form-urlencoded as well as multipart/form-data and provide validation as well. It would be quite similar to Serde but specialized for form input types and designed around push-based deserialization to support streaming async bodies.

Something like:

#[derive(FormInput)]
pub struct RegistrationForm {
    #[validate(regex = "[a-z,A-Z,0-9,_]", error = "usernames may only contain letters, numbers, or underscores")]
    username: String,
    #[validate(min_len = "8")]
    password: String,
    email: Email, // derefs to `str`, auto-validates for emails
    dateOfBirth: Date,
    #[validate(expected_val = "true", error = "you must accept our terms of service to register")]
    acceptedTos: bool,
    #[validate(content_type = "image/*", error = "avatar must be an image")] // with const-generics this could be a type parameter instead
    avatar: Option<UploadedFile>, // optional field
    // user doesn't have to handle multipart at all
}

The derive itself or a macro could generate a BodyExtractor impl for this type so the user would just have to declare it.

The actual core FormInput trait wouldn't be aware of sync/async or the specific http/web stack, it would just spit out a list of fields and the implementation would parse them and then provide them to it.

abonander commented 6 years ago

(the example is just a strawman, the user would probably want to check validation errors and emit messages programmatically with however they plan on returning them to the user)

whitfin commented 6 years ago

Hey all; couple of questions because I'm working on a Gotham project and I'm hitting some stuff related to this (I think). I've only been looking at Gotham for a week or two, so if any of my understanding is wrong, apologies in advance!

I think a good place to start is to get parsing handled by an extractor at all, even without validation. I'm currently working on an API where I'm having to manually parse bodies in almost every handler. I'm working with JSON, and doing something like this. While this works, it's quite noisy to deal with. An API like this would be much nicer to use, and also jive with the existing extraction syntax:

route
        .post("/user/create")
        .with_body_extractor::<JsonExtractor<User>>>()
        .to(create_user)

This is essentially just plugging in an extractor for JSON and specifies the struct we want to parse to (via User). I'm not sure how feasible this is, but as a user, this seems pretty clean. You probably don't have to do much more than wrap up serde_json and you'd be good to go (incidentally, there is also a serde_urlencoded).

As for the validation aspects, has anyone considered a project like validator for this functionality? I've never used it myself, but it seems quite simple to integrate with. At a high level it looks quite similar to the suggestion from @abonander.

This might not be anything additional to what's already in this thread, but I just wanted to drop my feedback since I happened to come across this issue whilst looking for an OOTB solution.

willi-kappler commented 6 years ago

@whitfin: your JSON crate and the validator sound really nice. @abonander and @bradleybeddoes: what do you think ? I would like to help but I'm new to gotham and async. Is there any progress on "2. hack on Router" ?

sezna commented 4 years ago

Is there any progress on the status of body post body extraction? I'm unsure what the currently advised way of extracting post bodies is -- the examples are using to_async but the latest version on crates.io doesn't have that method, and the examples are not versioned and released to crates.io.

technic commented 4 years ago

There was some progress with async fn as handlers, but probably that's not done yet.

I had my own implementation of this here

Then you can read full body to memory and parse it in any way. Like here

P.S. In the end I migrated to actix ;)

msrd0 commented 4 years ago

If you are looking for a json body parser, you might be interested in msrd0/gotham-restful. Otherwise, I recommend you either use the latest release candidate / the master branch of gotham, or another (preferably actively maintained) web framework. (Hopefully gotham will become actively maintained again at some point)

sezna commented 4 years ago

Ah, I just was migrating from Iron to this since iron went unmaintained. Lol, I didn't know this was not actively maintained either. Perhaps a notice could be put in the README?

msrd0 commented 4 years ago

Well, I guess this is just my subjective impression from maintainers basically doing nothing for month consecutively, not an official statement from the maintainers. So take it with a grain of salt.

bradleybeddoes commented 4 years ago

ping @whitfin @nyarly @colinbankier re: maintenance status. Possibility of handling reigns to someone else who is interested in continuing driving the project forward?

sezna commented 4 years ago

We did chat about this in the Gitter a bit. There are some people who would be interested in helping out where possible, myself included.

colinbankier commented 4 years ago

Hey all - absolutely. This is an issue that has been coming up. I'm more than happy to add others who are keen to help out since it seems none of us existing maintainers have been able to commit much time to it lately. We can continue the discussion over in the gitter chat.

gotham-rs / gotham

Implement Body extractor #11