verdict-project / verdict

Interactive-Speed Analytics: 200x Faster, 200x Fewer Cluster Resources, Approximate Query Processing
http://verdictdb.org
Apache License 2.0
248 stars 66 forks source link

Making unit tests independent of our server #370

Closed pyongjoo closed 5 years ago

pyongjoo commented 5 years ago

Currently, the tests for Presto, Impala, and Redshift depend on external services. We want to make them solely run in Docker environments.

https://github.com/mozafari/verdictdb/pull/369

dongyoungy commented 5 years ago

I took a look at the presto docker image: https://hub.docker.com/r/starburstdata/presto/

By default, the docker image comes with tpc-h (and also tpc-ds) read-only catalogs with different scale factors, so for unit tests that utilize tpc-h queries, it looks like we can simply use the data inside the image without having to generate them.

A potential problem I see with this image though is that we need to use memory catalog for writing (e.g., creating scrambles), but I am pretty sure it has a size limit (a doc I found says that default is 128MB -- and could not find any convenient way to change it), so I guess we should be careful that none of unit tests writes more than this limit.

Also, due to the data that the image contains by default, it seems like the image itself is about >1GB, so it might add up to startup time in each CircleCi test run.

I do not have a good idea on Impala and Redshift at the moment.

pyongjoo commented 5 years ago

Thanks for investigation. I think things look good in general, except for the fact that the image size is > 1GB. Maybe, we can create our own docker image later (in the far future).

If presto seems to be working, please add a docker command in dev setup wiki page (in our private dev repo).

dongyoungy commented 5 years ago

I have not tested running our presto unit tests on the docker image yet. I will work on it and update the progress accordingly.

pyongjoo commented 5 years ago

Can you share any updates?

dongyoungy commented 5 years ago

I am almost there to make a pull request of first draft for running presto unit tests locally. I will write the details of change/issues in that pull request.

pyongjoo commented 5 years ago

Although you may already know, this image is using presto's built-in tpch connector that generates the data on the fly. Reference: https://teradata.github.io/presto/docs/0.167-t/connector/tpch.html