Setup for remote relationship benchmarks using sportsdb

nizar-m commented 4 years ago

Description

The python script test_with_sportsdb.py script should setup the databases and graphql engines required for the tests.

Affected components

Benchmarks

Related Issues

Solution and Design

Steps to test and verify

Limitations, known bugs & workarounds

jberryman commented 4 years ago

It would be great if we could compare performance against master for queries that don't involve remote joins. What would be the best way to accomplish that? Can this PR work against master?

jberryman commented 4 years ago

Ah I also noticed on my machine this creates directories that have weird owenership:

total 16552
-rw-r--r--  1 me me       2797 Oct 10 12:15 hge.log
-rw-r--r--  1 me me       3181 Oct 10 12:15 remote_hge.log
drwx------ 19 70 root     4096 Oct 10 12:15 remote_sportsdb_data
-rw-r--r--  1 me me     675840 Oct 10 11:32 sportsdb_cache.sqlite
drwx------ 19 70 root     4096 Oct 10 12:15 sportsdb_data
-rw-r--r--  1 me me   15597968 Oct 10 11:32 sportsdb_sample_postgresql_20080304.sql
-rw-r--r--  1 me me     652509 Oct 10 11:32 sportsdb.zip

This causes problems for tooling like hasktags or ack, since they can't be traversed.

nizar-m commented 4 years ago

When I run the final benchmarking step I get this:

====================
benchmark: events_remote_affilications
  --------------------
  candidate: events_remote_affiliations on hge-with-remote at http://127.0.0.1:8081/v1/graphql
    Warmup:
      ++++++++++++++++++++
      20Req/s Duration:60s open connections:20
      unable to connect to 127.0.0.1:8081 Connection refused
      ++++++++++++++++++++
      40Req/s Duration:60s open connections:20
      unable to connect to 127.0.0.1:8081 Connection refused
    Benchmark:
      ++++++++++++++++++++
      20Req/s Duration:300s open connections:20
      unable to connect to 127.0.0.1:8081 Connection refused
      ++++++++++++++++++++
      40Req/s Duration:300s open connections:20
      unable to connect to 127.0.0.1:8081 Connection refused
 * Serving Flask app "bench" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:8050/ (Press CTRL+C to quit)

And visiting http://0.0.0.0:8050/ shows an empty graph

The connection from docker to a localhost application will work only when the docker is running with --net=host. I have changed the readme to reflect this. (This may not work on mac though).

nizar-m commented 4 years ago

Ah I also noticed on my machine this creates directories that have weird owenership:

total 16552
-rw-r--r--  1 me me       2797 Oct 10 12:15 hge.log
-rw-r--r--  1 me me       3181 Oct 10 12:15 remote_hge.log
drwx------ 19 70 root     4096 Oct 10 12:15 remote_sportsdb_data
-rw-r--r--  1 me me     675840 Oct 10 11:32 sportsdb_cache.sqlite
drwx------ 19 70 root     4096 Oct 10 12:15 sportsdb_data
-rw-r--r--  1 me me   15597968 Oct 10 11:32 sportsdb_sample_postgresql_20080304.sql
-rw-r--r--  1 me me     652509 Oct 10 11:32 sportsdb.zip

This causes problems for tooling like hasktags or ack, since they can't be traversed.

You can now specify the directory where all these files should be present. If you use the same work directory the second time, bringing up the test setup would be much faster. Instead of doing the full setup, we would simple reuse the Postgres data directories sportsdb_data and remote_sportsdb_data (these directories are bind mounted to Postgres dockers).

nizar-m commented 4 years ago

Probably we can run the master using the corresponding docker image, and then run benchmarks.

So I guess we need to make the following comparisions:

Master vs remote relationship branch for the normal queries

Query with table object/array relationship vs with remote object/array relationship

Do we have other comparisons to make?

jberryman commented 4 years ago

Ah I also noticed on my machine this creates directories that have weird owenership:
total 16552
-rw-r--r--  1 me me       2797 Oct 10 12:15 hge.log
-rw-r--r--  1 me me       3181 Oct 10 12:15 remote_hge.log
drwx------ 19 70 root     4096 Oct 10 12:15 remote_sportsdb_data
-rw-r--r--  1 me me     675840 Oct 10 11:32 sportsdb_cache.sqlite
drwx------ 19 70 root     4096 Oct 10 12:15 sportsdb_data
-rw-r--r--  1 me me   15597968 Oct 10 11:32 sportsdb_sample_postgresql_20080304.sql
-rw-r--r--  1 me me     652509 Oct 10 11:32 sportsdb.zip
This causes problems for tooling like hasktags or ack, since they can't be traversed.
You can now specify the directory where all these files should be present. If you use the same work directory the second time, bringing up the test setup would be much faster. Instead of doing the full setup, we would simple reuse the Postgres data directories sportsdb_data and remote_sportsdb_data (these directories are bind mounted to Postgres dockers).

That's an okay workaround. It would be nicer if they lived in the tree but not owned by root. But it looks like this is a pain to do; not sure:

https://gist.github.com/nitrobin/4d16fbe347c150a422ad https://github.com/moby/moby/issues/2259

jberryman commented 4 years ago

I was able to try again and everything went smoothly I think, following your instructions!

Random thoughts mostly for when I get a chance to integrate this with my dev.sh script, and mostly concerning improvements to https://github.com/hasura/graphql-bench

I'm not sure why we need to run a warmup for every rps setting...
errors or timeouts will be super important to get in front of eyes somehow, either in a summary or on a graph somehow... EDIT: ya, now that I'm digging in I don't see that errors are surfaced in any way (I suppose this is first and foremost an issue with wrk or how we're using it); it would be good to have the ability to benchmark non-200 queries, but not when we're expecting successful responses. We especially want to be able to see when higher load is breaking things. Honestly I'd rather see every sample on some kind of scatter plot, with failures marked in red and trend lines for percentiles
We'll probably want a flag that will allow bypassing the stack build (since we may be building with profiling, etc.)
when running multiple candidates it looks like some get dropped from the graph at higher RPS, though I don't see obvious errors reported
you can't look at plots for two different runs at the same time in different browser tabs; since switching "response time metric" will just pull data from most recent run
We should be able to see at least P50/P95/P90 at a glance all in the same graph;
- also if there is a way to snap the graph height to e.g. a power of 2, this makes it much easier to sort of eyeball and compare runs
- also I don't know if the x axis needs to be a scale... I think we can just label several distinct plots and put them side by side
we can think more carefully about what questions we're really trying to answer and script for that; e.g. we might like to know how many requests per second can we support before latency starts to degrade? (we can figure this out pretty easily using some trial and error, which is great)
my preferred way to compare benchmark metrics (e.g. comparing performance of two or more branches) is to have a table of percent changes that includes coloring indicating how good or bad the change was, and how significant. (I think this may only make sense when comparing the mean). e.g.:

bench

jberryman commented 4 years ago

Would you mind adding a link to the included queries.graphql in the readme, and mentioning how one selects the queries with query: in the YAML? It's pretty self-explanatory once you know what files to look at but would be helpful. Maybe this belongs in the docs for https://github.com/hasura/graphql-bench instead

jberryman commented 4 years ago

Also it would be really awesome if you could include a bunch of interesting queries in queries.graphql with comments. e.g. it would be nice to have some that:

return a lot of data but with simple SQL
return a small amount of data
return a small amount of data but generate a big SQL query
whatever else you can think of

Obviously we can all iterate on this and contribute queries as we go, and this will be an ongoing project

jberryman commented 4 years ago

Also mention in README, something like: "The graphql-engine log file is located in ./server/tests-py/remote_relationship_tests/test_output/hge.log and persists between runs"

jberryman commented 4 years ago

Another thing we should improve: abort if the stack build fails (else we will continue on with the wrong version)

tirumaraiselvan / graphql-engine