tirumaraiselvan / graphql-engine

Blazing fast, instant realtime GraphQL APIs on Postgres with fine grained access control
https://hasura.io
Apache License 2.0
2 stars 0 forks source link

Setup for remote relationship benchmarks using sportsdb #70

Open nizar-m opened 4 years ago

nizar-m commented 4 years ago

Description

The python script test_with_sportsdb.py script should setup the databases and graphql engines required for the tests.

Affected components

Related Issues

Solution and Design

Steps to test and verify

Limitations, known bugs & workarounds

jberryman commented 4 years ago

It would be great if we could compare performance against master for queries that don't involve remote joins. What would be the best way to accomplish that? Can this PR work against master?

jberryman commented 4 years ago

Ah I also noticed on my machine this creates directories that have weird owenership:

total 16552
-rw-r--r--  1 me me       2797 Oct 10 12:15 hge.log
-rw-r--r--  1 me me       3181 Oct 10 12:15 remote_hge.log
drwx------ 19 70 root     4096 Oct 10 12:15 remote_sportsdb_data
-rw-r--r--  1 me me     675840 Oct 10 11:32 sportsdb_cache.sqlite
drwx------ 19 70 root     4096 Oct 10 12:15 sportsdb_data
-rw-r--r--  1 me me   15597968 Oct 10 11:32 sportsdb_sample_postgresql_20080304.sql
-rw-r--r--  1 me me     652509 Oct 10 11:32 sportsdb.zip

This causes problems for tooling like hasktags or ack, since they can't be traversed.

nizar-m commented 4 years ago

When I run the final benchmarking step I get this:

====================
benchmark: events_remote_affilications
  --------------------
  candidate: events_remote_affiliations on hge-with-remote at http://127.0.0.1:8081/v1/graphql
    Warmup:
      ++++++++++++++++++++
      20Req/s Duration:60s open connections:20
      unable to connect to 127.0.0.1:8081 Connection refused
      ++++++++++++++++++++
      40Req/s Duration:60s open connections:20
      unable to connect to 127.0.0.1:8081 Connection refused
    Benchmark:
      ++++++++++++++++++++
      20Req/s Duration:300s open connections:20
      unable to connect to 127.0.0.1:8081 Connection refused
      ++++++++++++++++++++
      40Req/s Duration:300s open connections:20
      unable to connect to 127.0.0.1:8081 Connection refused
 * Serving Flask app "bench" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:8050/ (Press CTRL+C to quit)

And visiting http://0.0.0.0:8050/ shows an empty graph

The connection from docker to a localhost application will work only when the docker is running with --net=host. I have changed the readme to reflect this. (This may not work on mac though).

nizar-m commented 4 years ago

Ah I also noticed on my machine this creates directories that have weird owenership:

total 16552
-rw-r--r--  1 me me       2797 Oct 10 12:15 hge.log
-rw-r--r--  1 me me       3181 Oct 10 12:15 remote_hge.log
drwx------ 19 70 root     4096 Oct 10 12:15 remote_sportsdb_data
-rw-r--r--  1 me me     675840 Oct 10 11:32 sportsdb_cache.sqlite
drwx------ 19 70 root     4096 Oct 10 12:15 sportsdb_data
-rw-r--r--  1 me me   15597968 Oct 10 11:32 sportsdb_sample_postgresql_20080304.sql
-rw-r--r--  1 me me     652509 Oct 10 11:32 sportsdb.zip

This causes problems for tooling like hasktags or ack, since they can't be traversed.

You can now specify the directory where all these files should be present. If you use the same work directory the second time, bringing up the test setup would be much faster. Instead of doing the full setup, we would simple reuse the Postgres data directories sportsdb_data and remote_sportsdb_data (these directories are bind mounted to Postgres dockers).

nizar-m commented 4 years ago

Probably we can run the master using the corresponding docker image, and then run benchmarks.

So I guess we need to make the following comparisions:

Master vs remote relationship branch for the normal queries

Query with table object/array relationship vs with remote object/array relationship

Do we have other comparisons to make?

jberryman commented 4 years ago

Ah I also noticed on my machine this creates directories that have weird owenership:

total 16552
-rw-r--r--  1 me me       2797 Oct 10 12:15 hge.log
-rw-r--r--  1 me me       3181 Oct 10 12:15 remote_hge.log
drwx------ 19 70 root     4096 Oct 10 12:15 remote_sportsdb_data
-rw-r--r--  1 me me     675840 Oct 10 11:32 sportsdb_cache.sqlite
drwx------ 19 70 root     4096 Oct 10 12:15 sportsdb_data
-rw-r--r--  1 me me   15597968 Oct 10 11:32 sportsdb_sample_postgresql_20080304.sql
-rw-r--r--  1 me me     652509 Oct 10 11:32 sportsdb.zip

This causes problems for tooling like hasktags or ack, since they can't be traversed.

You can now specify the directory where all these files should be present. If you use the same work directory the second time, bringing up the test setup would be much faster. Instead of doing the full setup, we would simple reuse the Postgres data directories sportsdb_data and remote_sportsdb_data (these directories are bind mounted to Postgres dockers).

That's an okay workaround. It would be nicer if they lived in the tree but not owned by root. But it looks like this is a pain to do; not sure:

https://gist.github.com/nitrobin/4d16fbe347c150a422ad https://github.com/moby/moby/issues/2259

jberryman commented 4 years ago

I was able to try again and everything went smoothly I think, following your instructions!

Random thoughts mostly for when I get a chance to integrate this with my dev.sh script, and mostly concerning improvements to https://github.com/hasura/graphql-bench

bench

jberryman commented 4 years ago

Would you mind adding a link to the included queries.graphql in the readme, and mentioning how one selects the queries with query: in the YAML? It's pretty self-explanatory once you know what files to look at but would be helpful. Maybe this belongs in the docs for https://github.com/hasura/graphql-bench instead

jberryman commented 4 years ago

Also it would be really awesome if you could include a bunch of interesting queries in queries.graphql with comments. e.g. it would be nice to have some that:

Obviously we can all iterate on this and contribute queries as we go, and this will be an ongoing project

jberryman commented 4 years ago

Also mention in README, something like: "The graphql-engine log file is located in ./server/tests-py/remote_relationship_tests/test_output/hge.log and persists between runs"

jberryman commented 4 years ago

Another thing we should improve: abort if the stack build fails (else we will continue on with the wrong version)