Closed Beastjoe closed 5 years ago
Merging #322 into master will decrease coverage by
0.08%
. The diff coverage is61.91%
.
@@ Coverage Diff @@
## master #322 +/- ##
==========================================
- Coverage 70.66% 70.59% -0.07%
==========================================
Files 167 167
Lines 11178 11198 +20
Branches 1832 1839 +7
==========================================
+ Hits 7898 7904 +6
- Misses 2827 2835 +8
- Partials 453 459 +6
Impacted Files | Coverage Δ | |
---|---|---|
...in/java/org/verdictdb/core/sqlobject/ColumnOp.java | 63.2% <100%> (+0.6%) |
:arrow_up: |
...or/QueryResultAccuracyEstimatorFromDifference.java | 84.16% <57.9%> (-8.61%) |
:arrow_down: |
...verdictdb/core/querying/ola/InMemoryAggregate.java | 84.79% <0%> (-2.89%) |
:arrow_down: |
...ain/java/org/verdictdb/commons/DBTablePrinter.java | 53.8% <0%> (-2.75%) |
:arrow_down: |
...ictdb/core/querying/QueryNodeWithPlaceHolders.java | 88.28% <0%> (-0.68%) |
:arrow_down: |
...a/org/verdictdb/jdbc41/VerdictStreamResultSet.java | 8.56% <0%> (+0.27%) |
:arrow_up: |
.../verdictdb/core/execplan/ExecutableNodeRunner.java | 81.12% <0%> (+0.86%) |
:arrow_up: |
...org/verdictdb/sqlreader/ScrambleTableReplacer.java | 87.26% <0%> (+1.97%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update f6e366a...50a45d9. Read the comment docs.
Could we somehow keep NULL values for SUM? According to http://www.mysqltutorial.org/mysql-sum/, the standard behavior seems to be that SUM must return NULL when no rows are chosen. But, COUNT must return 0 when no rows are chosen.
I think currently, the value inside DbmsQueryResult is NULL if sum select no rows. However, when checking converge in AccuracyEstimator, NULL will be treated as 0. Is this implementation correct?
I think the behavior is correct. Can you add unit tests for confirming it?
Yes. One more question: when user try to get null value using methods like getDouble(), will methods return 0 or null? The current implementation will return 0 if the value is null.
So, how getValue() must be implemented and getDouble() must be implemented are two separate issues.
getValue() must return true values (e.g., NULL for the summation of an empty set). However, if the user anyway wants an answer using getDouble(), we must cast it to some double value. For this conversion, I was following the JDBC standard, but you can double-check.
Since we sum up the aggregate values in selectAsyncAggExecutionNode, count() will also return NULL if no rows are selected. Does it matter?
Can you explain more details why the sum of counts will be NULL?
The individual counts could be zero when no rows are chosen; thus, the sum of them could also be zero.
Let me double check this.
The reason is when the aggregation query contains group-by and no row is selected, it will return empty set. Therefore, DbmsQueryResult returned from SelectAggExecutionNode contains nothing. Then, SelectAggAsyncAggExecutionNode tries to sum up the result, which results in NULL value for count().
Hmm. This is interesting and tricky. Let's examine the original select query, and in case a select item includes count but not other aggregates, replace NULL -> 0. Otherwise, I don't see any harm in keeping null values. FYI, the original select query is already there in the accuracy estimator. I included it for determining aggregate columns.
When it comes to the convergence test, the test must fail for both cases (either 0 or NULL). I think zero values naturally make the test fail, so probably need to add a special treatment for NULL.
I have made the changes. Please have a check.
Related issue #321.
The problem is in the execution of selectAsyncAggNode, where H2 jdbc driver will return NULL value if no rows are selected for aggregating. I think adding the check inside QueryResultAccuracyEstimatorFromDifference is enough, since DbmsQueryResult will interpret NULL as 0 automatically.