Open jangorecki opened 4 years ago
Hmm, looks like the biggest variation is in "big inner on int" tests (rows 5 and 10)
Yes, it is big to big join where we join table of the same size, 90% of rows are matching
Other join queries have now also very unstable timings, possibly caused by #2775.
For example q2 "medium inner on int":
On 1e9 one time 622.36, 687.774
, another time 1592.488, 1306.6
.
On 1e8 one time 152.617, 138.237
and another 505.987, 449.31
.
Using same source (b4f78fbbb7aeee1d22b56cc33f994b7b48d23765).
Pydatatable join can be very fast, but in case of big to big join the variance of timing is very big. Numeric columns presents unix epoch time of the benchmark run. All timings made on 1f81e5711b77f93494fa01379d8dd242e4b45cea. 1e9 timings are on-disk, while the others are in-memory. Numbers in seconds.
I don't think we have to do anything about that because even when it is slower, it is still quite fast, but reporting so it is known and documented in project repo.