hyperium / http

Rust HTTP types
Apache License 2.0
1.16k stars 291 forks source link

There should be an uri::Builder #206

Open SergejJurecko opened 6 years ago

SergejJurecko commented 6 years ago

Right now the burden of correct percent encoding is entirely on the user. This is completely unnecessary. Only accepting string URI also forces users to concatenate strings to construct an URI, which is bad and inefficient practice that leads to bugs.

Examples

Entirely segmented

Uri::builder()
        .https()    // In mio_httpc I have both https() and set_https(true).
        .host("www.example.com")
        .auth("user name", "my password")
        .port(12345)
        .path_segm("a/")
        .path_segm("b")
        .query("spaced key", "123")
        .query("key", "<>")

Only partly segmented because schemes, hosts and optional ports are not a percent encoding problem and often stored as a fixed string on a client.

Uri::builder()
        .base("https://www.example.com:8080")?
        .path_segm("a/")
        .path_segm("b")
        .query("spaced key", "123")
        .query("key", "<>")

Query and path segments could also be slices.

Uri::builder()
        .base("https://www.example.com:8080")?
        .path(&["a", "b", "c d"])
        .query_pairs(&[("spaced key", "123"),("another","key")])

(edit by Sean)

To track the progress, here's what's done and needs to be done:

seanmonstar commented 6 years ago

Thanks for writing this up! I agree with premise, we should be able to provide a builder for easier creation.

Since different parts of a URI have different allowed characters, the builder would need to return a Result.

SergejJurecko commented 6 years ago

Invalid characters are percent encoded so I don't see a way for query/path functions to fail. .path_segm("a/") should percent encode the slash. Base should return Result because you can't stop a user from writing nonsense.

SergejJurecko commented 6 years ago

Servo's percent-encoding crate is a small dependency that defines the right encoding for the right part of the URL quite well.

seanmonstar commented 6 years ago

That's a good point for path and query.

seanmonstar commented 6 years ago

It doesn't have all that is mentioned in here, but I've at least started a builder concept in https://github.com/hyperium/http/pull/219.

seanmonstar commented 6 years ago

219 has been merged, which provides methods for:

There is room to add methods for smaller parts, but they should be discussed first.

yageek commented 6 years ago

To add more granularities to the Builder, some additionnal struct could be addeded. The PR #268 proposes a QueryItem struct to stores the information about the query component path. This is directly inspired from Swift URLQueryItem. It has one string field for the key and one optional string for the value.

If the builder would allow to build path and query separately, we could offer a better way, according to me, to abstract the different endpoints of an API.

Inspired by some Swift practices I’m using, we could have some code like this :

pub trait Endpoint {
    fn method(self) -> http::Method;
    fn path(self) -> String;
    fn query(self) -> Vec<QueryItem>;
}

This would allow to abstract the different API's paths by any element implementing the Endpoint trait. It would then be easy to have one uniq method transforming any Endpoint into an URI request.

The form of the QueryItem could be:

struct QueryItem<'a> {
    key: &'a str;
    value: Option<&'a str>;
}

Or:

struct QueryItem<'a> {
    key: String;
    value: Option<String>;
}

As a non Rust expert, I do not have a clear vision about the consequences of the adoption of one form or the other.

In the end, the URI builder would require new fields to be able to construct the URI. This may involve changing the internal API for Parts:

#[derive(Debug, Default)]
pub struct Parts {
    /// The scheme component of a URI
    pub scheme: Option<Scheme>,

    /// The authority component of a URI
    pub authority: Option<Authority>,

    /// The origin-form component of a URI
    // pub path_and_query: Option<PathAndQuery>, --> Removed it?

    /// The path of the query
    pub path: String

    /// The different query items
    pub query_items: Vec<QueryItem>

    /// Allow extending in the future
    _priv: (),
}

The builder could be updated accordingly:

impl Builder {
    pub fn path(&mut self, path: String) -> &mut Self;
    pub fn query(&mut self, key: String, value: Option<String>) -> &mut Self;
}
lelandjansen commented 5 years ago

Any movement on this? I'd love to see this feature in 0.2 and am happy to draft a PR if this is something we're still interested in.

Building on @SergejJurecko's ideas, I'd like to propose the following API:

let uri = Uri::builder()
    .scheme(POSTGRES) // Built-in &str constants also supporting HTTPS (default), HTTP, MYSQL, etc. Can also be a user-supplied string
    .auth("username", "password")
    .host("www.example.com")
    .port(12345)
    .path(&["a", "b", "c d"])
    .query("key", "value")
    .query("key2", "value two")
    .queries(&[("k3,", "v3"), ("k4", "v4")])
    .fragment("heading")
    .build()
    .unwrap();
assert_eq!(
    "postgres://username:password@www.example.com:12345/a/b/c%20d?key=value&key2=value%20two&k3=v3&k4=v4#heading",
    uri.as_str()
);

I'd also like to propose a convenience path_from("a/b/c") method that can be used in lieu of .path(...)that splits at the / delimeter and internally calls path(&["a", "b", "c"]).

I'm having difficulty envisioning a use case for specifying individual path components and would be partial towards omitting the proposed path_segm.

Feedback welcome!

Edit: Added unwrap call.

SergejJurecko commented 5 years ago

I'm having difficulty envisioning a use case for specifying individual path components and would be partial towards omitting the proposed path_segm.

For building requests against an API. Often different requests share part of a path and differ in some segments. It does not make sense in your example because everything is built in one go. It's for when you require building requests in stages.

lelandjansen commented 5 years ago

It's for when you require building requests in stages.

Apologies. I made an assumption in my proposal that wasn't mentioned explicitly (and definitely was't clear in my comment). My intention was for path, path_from, query, and queries to push items to their respective uri components. The proposal to omit path_seg was because one could call path_from("single-item") to achieve the same effect.

This does raise an interesting question: From the name do we expect these methods to push items or replace their existing values (for both path and query components)?

Perhaps a more clear API would be: push_path, push_path_from, push_query, and push_queries?

SergejJurecko commented 5 years ago

I don't see how replace items would be a useful or expected operation in this context. push_ is pretty verbose and redundant if everything is push.

lelandjansen commented 5 years ago

I don't see how replace items would be a useful or expected operation in this context. push_ is pretty verbose and redundant if everything is push.

Agreed.

@seanmonstar @yageek Any additional questions/comments?

carllerche commented 5 years ago

IMO it would be fine to add a few more fns but we should stay on the conservative side.

Host, port, path, and query are probably fine. Maybe even push_path, and query_param.

lelandjansen commented 5 years ago

@carllerche It sounds like you're concerned about bloating the API with redundant functions, which I can appreciate.

Wikipedia's URI syntax diagram shows a scheme, userinfo, host, port, path, query, and fragment. These components are detailed in other documentation as well, and I'd argue it's important to offer methods for each one.

That said, we might want to think about limiting the number of functions we offer for each component. Some ideas for discussion:

* I recognize it's bad practice to provide authentication credentials in a URI and developers should prefer a more secure method (e.g. HTTP header). That said, from what I've seen this practice is still fairly common. I'd advise adding a cautionary message in the documentation. We could deprecate the auth method right away since the userinfo component is deprecated, which would produce a compiler warning.

Maybe even push_path, and query_param.

Do you favor this naming over path and query?

carllerche commented 5 years ago

The http::Uri type is not intended to be a general purpose URI abstraction. It is the http protocol’s request target. As such, some components don’t apply.

Auto should not be included as it is deprecated and it is strongly recommended to not use the auth component. In cases where it is needed, it can be set the more painful way.

The current components should remain.

Generally, IMO fns that append vs set should be differentiated. I.e push_path and push_query_param.

lelandjansen commented 5 years ago

The http::Uri type is not intended to be a general purpose URI abstraction. It is the http protocol’s request target. As such, some components don’t apply.

It sounds like http::Uri is not intended to be used in place of the Url crate.

Auto should not be included as it is deprecated and it is strongly recommended to not use the auth component. In cases where it is needed, it can be set the more painful way.

Noted.

The current components should remain.

Noted.

Generally, IMO fns that append vs set should be differentiated. I.e push_path and push_query_param.

Right now the proposed API only appends (possibly to a blank list). @SergejJurecko and I agree that push_* is quite verbose if that's the only functionality.

  1. Are you suggesting we have APIs for both set (replace) and push? I don't favor this approach. A common pattern that I've seen is to store a base URI (e.g. example.com/path) as a constant and append the query components as required, for example, for an oauth flow. If you need to change the query, clone the base URI and add the new components.
  2. Do you prefer the more verbose push_* over simply path, path_from, query, and queries if we only append (not set)?