join questions should ensure answers are materialized

All 5 basic questions that are now defined as join task are not effectively forcing all computations to be finished. We do print nrow and ncol of the answer from each question. This unfornately is not enough to enforce answer to be materialized. To know the nrow of the answer it is enough to compute matching rows, not necessarily performing the join of both datasets. Ncol is obvious just from the query, not even looking at the data. As a result we should ensure that such optimization is not taking place, by either using API of a solution to force that part of computation, or by changing the queries to include an extra computation that actually requires data to be materialized. Extra computation could be either artificial one (head and tail) or more real-life use case of data after joining. The latter one will cause a problem due to the fact that such a real-life computation will blury the join timing, in some cases likely to heavily diverge reported timing from the actual joining timing. Thus IMO the best way would be to force computation via API of a solution.

h2oai / db-benchmark

join questions should ensure answers are materialized #141