the-benchmarker / web-frameworks

Which is the fastest web framework?
MIT License
6.91k stars 641 forks source link

More realistic implementation #1728

Open waghanza opened 4 years ago

waghanza commented 4 years ago

Hi,

The implementations here are very simple, this is on purpose.

The routes to implement here are :

I propose to update code base to have a test on all (most used) HTTP methods and application/json instead of plain/text:

The idea is to be closest to a real-word reprensentation

Regards,

OvermindDL1 commented 4 years ago

Do you wish to force the use of json libraries and/or json framework handling then? Otherwise can just return strings, which will be just as fast as before?

waghanza commented 4 years ago

Yes. With such a system we have to add a dependency (ideally the same lib/built-in for each frameworks per language).

The purpose here is more to have a standard implementation, I mean something like in real-word.

I'm not convinced that plain/text API is really use in production nowadays, and json (even not the fastest format to parse/emit) is a kind of standard.

We also can use any standard format, such as jsonapi or else.

I'm open to any suggestions :heart:

PS : The infamous hello world seems to have the same problem. Which production API has this functionality

waghanza commented 4 years ago

In addition what I've said, I think @ohler55 is working on a graphql implementation of this kind of benchmark.

The idea is to have this repo, as the benchmarking tool, and one repo per implementation (simple api, graphql, ...)

ohler55 commented 4 years ago

Indeed there is a GraphQL benchmark started. I'd love to have a few more examples if anyone is willing to contribute. https://github.com/ohler55/graphql-benchmarks

waghanza commented 4 years ago

Sure, I can contribute with some python, ruby and crystal

You can move it here @ohler55 if you want to combien results 😀

ohler55 commented 4 years ago

I was hoping to hand it over at some point. I'm not sure if it make sense to combine everything into one large table or to keep GraphQL separate. Kind of feels like separate is better. I would be glad to transfer the repo to the-benchmarker though. Maybe that's what you had in mind. :-)

waghanza commented 4 years ago

Yep @ohler55, but I do not want to mix results. What I have in my mind is to have one repo per (what so called in tfb) scenario

ohler55 commented 4 years ago

Not sure what 'tfb' is but glad you want to keep them separate.

So, any time, you are ready for me to transfer ownership?

waghanza commented 4 years ago

Tfb is https://github.com/TechEmpower/FrameworkBenchmarks

Yes, I'm ready if you want to transfer ownership

ohler55 commented 4 years ago

Looks like I need permissions to create repos on the-benchmarker to complete the transfer.

waghanza commented 4 years ago

I've granted you more privileges

ohler55 commented 4 years ago

Transfer complete!

waghanza commented 4 years ago

I think I will implement this after next release (end of year). If anyone has some ideas, feel free to share :heart:

waghanza commented 4 years ago

Also, I think an authentication system should be implemented.

Any hard-coded string send as a token could be enough. The difficulty here is to avoid having complex implementations.

ohler55 commented 4 years ago

Getting into authentication is non-trivial. An arbitrary string is not really representative of the real world. Task OAuth2 as an example. A token server would be needed and the server would be expected to interact with that server. You start testing something much more than the web server at that point.

ohler55 commented 4 years ago

It might be interesting to compare different auth approaches and servers but thats probably better done as a separate repo. Another benchmarked repo of course.

bhansconnect commented 4 years ago

Passing this information from the simulate database delay issue:

Essentially, in order to know the true performance of a web server, it is important to know how it handles asynchronous tasks(waiting on the database, calling an api, etc). Most web servers do not have compute heavy workloads. They have delay heavy workloads with a little bit of data serialization at the end. I propose that one of the endpoints that get tested, have the web server sleep for x amount of milliseconds and then return a simple payload. This will simulate database calls.

The added endpoint could literally just be /sleep/:time_ms. This way it is one single endpoint that can be tuned for any amount of delay( i.e. simple: 1-10 ms, normal: 10 - 20 ms, complex: 50-100 ms). Could even define categories based of local vs cloud db expected latency. Of course it doesn't need to get anywhere near that complicated, but could enable users to test different delays for themselves.

kazzkiq commented 4 years ago

PUT and POST should also have some sort of basic validation to ensure all expected keys exists within the received JSON.

ohler55 commented 4 years ago

I believe the purpose of these benchmarks are to compare servers. By adding additional non-server code you take away from the server comparison and start to get into how well someone writes an application which is outside the scope of the server comparison and more in the realm of the applications which can often be used with multiple servers.

waghanza commented 4 years ago

To be precise the purpose is to compare performance of server-side frameworks.

Their is two thing to notice.

Two things are required for me :

Having a delay to simulate database connection, or using a custom emitter/parser (json) is, after thinking about it, not necessary (even a kind of obvious).

For the database, as @OvermindDL1 says once, calls are mostly made asynchronously. By this I mean, having a delay will not, I think, change the comparison. In other terms, implement a delay will not be valuable regarding our objective.

For the format (json), I'm not convinced that this will change the results regarding our objective. Plain text is enough for me

bhansconnect commented 4 years ago

I am not sure you can have a realistic and real-life like benchmark if you ignore both json and asyncronous performance. Even if database calls are generally handled asyncronously, some servers are synchronous and do not support this. Also, some of the servers will perform poorly when given a proper async task because their evented io engines are not good.

ohler55 commented 4 years ago

I think you will find that the higher performance server in the results do support asynchronous concurrent requests and the testing tools do open 1000 connections and pipeline requests so asynchronous request handling and concurrent requests are already measure without having to introduce a delay on responses which would invalidate any latency measurements.

For JSON, most if not all servers accept text whether that be JSON, XML, or just random strings. After that is up a a JSON, XML, or other parser to extract data. That is not part of a server evaluation but of a parser performance which can and are benchmarked separately.

Consider what would help the most when building a web application. There are choices for web server, databases, parsers, encoders, and more. I you try to compare every combination of all those features you would have millions of combinations to compare. Hardy reasonable. Now instead if you benchmarks each component separately you can determine which is the best in class for each component and put together your application according. The benchmarks here are for servers. Adding additional components to the mix only obfuscates the results regarding the servers themselves.

waghanza commented 4 years ago

We are on the same line @ohler55.

What about testing all HTTP verbs ? Some routers could behave differently for POST, PUT ...

ohler55 commented 4 years ago

That could be added. I guess it depends on how vital each verb is to applications. For example, PATCH can be used to modify but so can PUT in a REST model. Getting more esoteric, the CONNECT method is probably not supported by most of the server since it is really for a proxy and SSL tunnelling. Then TRACE for loop back testing is not really something an application needs but could be passed through to the underlying application code. It might not be as simple as it sounds at first blush to define something reasonable for each.

waghanza commented 4 years ago

I think that

So it means, that you should cover

wdyt ?

ohler55 commented 4 years ago

Sure with a few suggestions to fix the HTTP method usage. PUT is usually used for creation so that should probably be on the list. GET should never modify and be read only. In addition, POST is used when modifying something on the server side and is an indication of the request being not idempotent so a simple retrieve is probably not the right functionality. Something like incrementing a counter would be more appropriate with the return value being a total. So something like this:

There will be a few side effects to implementing the listed methods or verbs. If you check the backend state then the use of workers will not be an option unless there is also a backend store which for reasons mentioned earlier overshadows the server testing or if no backend store it hides the fact that some servers support multiple workers. Basically some planning is needed to benchmark what you really care about.

The second side-effect of the change is that someone is going to have to go over all the existing code and add the new features. :-)

waghanza commented 4 years ago

If you check the backend state then the use of workers will not be an option unless there is also a backend store which for reasons mentioned earlier overshadows the server testing or if no backend store it hides the fact that some servers support multiple workers

You mention that the results could be misleading depending of how the servers are configured. I think so, but this on the purpose. What is in my mind :

The second side-effect of the change is that someone is going to have to go over all the existing code and add the new features. :-)

I'll handle it -> I'll ask help from any authors / contributors since I do not code is scala (example)

RicardoSette commented 4 years ago

@ohler55

PUT is usually used for creation so that should probably be on the list. GET should never modify and be read only. In addition, POST is used when modifying

there is RFC to address this, to say what each method should do

https://tools.ietf.org/html/rfc7231#page-24 https://tools.ietf.org/html/rfc2616#section-9.5

See that the use of each method is covered in the RFCs above, and we should follow this, or else rewrite these RFCs.

Just to say that POST is for update and PUT is create is to go against these RFCs, and the WEB forms would all be wrong, wouldn't they?

POST and PUT are methods determined by the server and is usually dependent on the Request-URI to determinate functionality.

I recommend caution when making definitions, the the-benchmarker/web-frameworks project is a good guide for strategic decisions and failing to follow standards can be a setback.

See a good example: https://www.yiiframework.com/doc/guide/2.0/en/rest-quick-start#trying-it-out

ohler55 commented 4 years ago

I stand corrected. The main point I was trying to make was that there were more HTTP method to consider that just the 4 common ones and that state when using multiple workers will need to be considered. Thank you for providing suggested behaviour for all the relevant methods.

OvermindDL1 commented 4 years ago

HEAD is probably the most common verb, most browsers and connections submit a HEAD request for viability for a lot of things. Some servers will optimize for HEAD calls or not, some will just call GET and drop the body, etc... Could be good to check for that too.

ohler55 commented 4 years ago

I think you will find most pass the request on to the implementation of the app since the server can't really guess what the app will do. I have no problem if you want to include it though. It's just not really a server test but an app implementation test.

jknack commented 3 years ago

Is there a warm-up phase? Java implementation needs a warm-up phase to get more stable results. For example Jooby is sometimes between 5..7, some other around the 15...18 rank.

Here are some details around JVM warm-up: https://stackoverflow.com/questions/36198278/why-does-the-jvm-require-warmup

waghanza commented 3 years ago

not very the scope of this PR, I think, but yes.

https://github.com/the-benchmarker/web-frameworks/blob/5ab8cdbd2084ae307c3977fa25fea5cb49318751/tools/src/client.cr#L50

The same methodology as techempower is used