seanmonstar / httparse

A push parser for the HTTP 1.x protocol in Rust.
https://docs.rs/httparse
Apache License 2.0
567 stars 111 forks source link

Draft: RFC configuration in const context & HTTP-like protocol parsing support #129

Closed gootorov closed 9 months ago

gootorov commented 1 year ago

I do not expect this PR to be merged as-is (or at all). The main purpose of this is to have a basis for discussion, hear different opinions and suggestions.

Some of the changes here are breaking, I'll leave additional information concerning each commit as a separate comment.

gootorov commented 1 year ago

First commit makes the following usage possible:

        const CONFIG: super::ParserConfig = super::ParserConfig::const_default()
            .allow_spaces_after_header_name_in_responses(true)
            .allow_multiple_spaces_in_request_line_delimiters(true);

which is otherwise rejected due to mutable references usage in const context.

For my use case in particular, this is just ergonomics. I know quite specifically the configuration I need, and it won't be changed. Of course, there are workarounds, like using lazy_static/once_cell, or simply inlining usual instantiation, so it's not exactly a big problem.

One thing though that I hoped to see is branch elimination. To be honest, I do not know enough about const propagation in rustc, but I hoped maybe if the compiler knew all options, it would be able to inline new branchless functions. However, looking at the assembly, it seems there are no changes at all. Maybe this will come in the future, or maybe I am asking for too much :)

Do you think something like this is worth a breaking change?

gootorov commented 1 year ago

Second commit is another stab at supporting HTTP-like protocols.

The main idea is that httparse can implement several parsers for the request/response-line and the user can choose these parsers through ParserConfig.

The commit here is does not implement this idea fully. I used SIP as the example, the only difference there is in version parsing in the response, but I suppose there could be other protocols that could differ slightly more. In which case, this could be refactored further to contain two function pointers, for request-line and response-line parsing for each supported respectively. Similarly to SIP, there have been requests for ICAP support in #128 and RTSP in #67.

Although as much as I would love to see this supported, this seems just a little bit out of project's scope. What do you think?

Regarding performance, this is what I got before and after this commit:

req/req                 time:   [327.43 ns 328.13 ns 328.83 ns]                    
                        thrpt:  [2.0392 GiB/s 2.0435 GiB/s 2.0479 GiB/s]
                 change:
                        time:   [+1.4046% +2.1175% +2.8697%] (p = 0.00 < 0.05)
                        thrpt:  [-2.7896% -2.0736% -1.3851%]
                        Performance has regressed.

req_short/req_short     time:   [65.513 ns 65.648 ns 65.785 ns]                                 
                        thrpt:  [985.79 MiB/s 987.84 MiB/s 989.87 MiB/s]
                 change:
                        time:   [-5.4697% -4.7538% -4.1018%] (p = 0.00 < 0.05)
                        thrpt:  [+4.2772% +4.9911% +5.7862%]
                        Performance has improved.

resp/resp               time:   [345.99 ns 346.97 ns 348.03 ns]                      
                        thrpt:  [1.8705 GiB/s 1.8763 GiB/s 1.8815 GiB/s]
                 change:
                        time:   [+4.8074% +5.5219% +6.3035%] (p = 0.00 < 0.05)
                        thrpt:  [-5.9297% -5.2330% -4.5869%]
                        Performance has regressed.

resp_short/resp_short   time:   [70.511 ns 70.656 ns 70.797 ns]                                   
                        thrpt:  [1.1971 GiB/s 1.1995 GiB/s 1.2019 GiB/s]
                 change:
                        time:   [-2.7264% -2.0204% -1.3541%] (p = 0.00 < 0.05)
                        thrpt:  [+1.3726% +2.0621% +2.8028%]
                        Performance has improved.

Which is a weird result. I am benchmarking this on a laptop though, so there might actually be no meaningful difference and this is just a fluke.