stoneatom / stonedb

StoneDB is an Open-Source MySQL HTAP and MySQL-Native DataBase for OLTP, Real-Time Analytics, a counterpart of MySQLHeatWave. (https://stonedb.io)
https://stonedb.io/
GNU General Public License v2.0
862 stars 139 forks source link

feature: expand sum Aggregate Function to int128_t #1226

Open davidshiz opened 1 year ago

davidshiz commented 1 year ago

Have you read the Contributing Guidelines on issues?

Please confirm if bug report does NOT exists already ?

Describe the problem

mysql> CREATE TABLE `test` (
    ->   `id` bigint(20) DEFAULT NULL
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test values(9223372036854775801);
Query OK, 1 row affected (0.00 sec)
mysql> insert into test values(9223372036854775802);
Query OK, 1 row affected (0.00 sec)
mysql> insert into test values(9223372036854775803);
Query OK, 1 row affected (0.00 sec)
mysql> insert into test values(9223372036854775804);
Query OK, 1 row affected (0.00 sec)

mysql> select *  from test;
+---------------------+
| id                  |
+---------------------+
| 9223372036854775801 |
| 9223372036854775802 |
| 9223372036854775803 |
| 9223372036854775804 |
+---------------------+
4 rows in set (0.00 sec)

mysql> SELECT SUM(id) FROM test;
ERROR 1105 (HY000): Aggregation overflow.
mysql> SELECT AVG(id) FROM test;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Expected behavior

mysql> SELECT SUM(id) FROM test;
+----------------------+
| SUM(id)              |
+----------------------+
| 36893488147419103210 |
+----------------------+
1 row in set (0.00 sec)

mysql> SELECT AVG(id) FROM test;
+--------------------------+
| AVG(id)                  |
+--------------------------+
| 9223372036854775802.5000 |
+--------------------------+
1 row in set (0.00 sec)

How To Reproduce

CREATE TABLE `test` (
  `id` bigint(20) DEFAULT NULL
);
insert into test values(9223372036854775801);
insert into test values(9223372036854775802);
insert into test values(9223372036854775803);
insert into test values(9223372036854775804);
SELECT AVG(id) FROM test;
SELECT SUM(id) FROM test;

Environment

root@ub01:~# /stonedb57/install/bin/mysqld --version
/stonedb57/install/bin/mysqld  Ver 5.7.36-StoneDB for Linux on x86_64 (build-)
build information as follow:
        Repository address: https://github.com/stoneatom/stonedb.git:stonedb-5.7-dev
        Branch name: stonedb-5.7-dev
        Last commit ID: 1fa9d3346
        Last commit time: Date:   Fri Jan 13 07:03:20 2023 +0000
        Build time: Date: Fri Jan 13 17:22:15 CST 2023
root@ub01:~# cat /etc/issue
Ubuntu 20.04.5 LTS \n \l

Are you interested in submitting a PR to solve the problem?

davidshiz commented 1 year ago

UNSIGNED bigint also have this problem

davidshiz commented 1 year ago

ref #1125

create table test(u_bigcustkey BIGINT);
insert into test values(4073709000001),(4073709000004),(4073709000009);
select avg(u_bigcustKey) from test;
mysql> select avg(u_bigcustKey) from test;
+--------------------+
| avg(u_bigcustKey)  |
+--------------------+
| 4073709000004.6667 |
+--------------------+
1 row in set (0.00 sec)
hustjieke commented 1 year ago

When overflow_check > std::numeric_limits::max() = 9223372036854775808, an exception catched in funciton PutAggregatedValue():

void AggregatorSum64::PutAggregatedValue(unsigned char *buf, int64_t v, int64_t factor) {
  stats_updated = false;
  int64_t *p = (int64_t *)buf; 
  if (*p == common::NULL_VALUE_64) { 
    *p = 0;
  }
  double overflow_check = double(*p) + double(v) * factor;
  if (overflow_check > std::numeric_limits<std::streamsize>::max() ||
      overflow_check < std::numeric_limits<std::streamsize>::min())
    throw common::NotImplementedException("Aggregation overflow.");
  *p += v * factor;
}

To solve the exception, overflow_check should be extended to int128_t, like: std::numeric_limits<int128_t>::max().

For sum64 or sum32 or sumdecimal, we should reset total_width , grouping_and_UTF_width to 16 bytes length(to store int128_t) In void GroupTable::Initialize(int64_t max_no_groups, bool parallel_allowed) {

  // Aggregators
  for (int i = no_grouping_attr; i < no_attr; i++) {
    aggregated_col_offset[i] = total_width - grouping_and_UTF_width; 
    total_width += aggregator[i]->BufferByteSize();
    total_width = 4 * ((total_width + 3) / 4);  // e.g. 1->4, 12->12, 19->20
  }

But will these buffer changes cause unknown problems?

hustjieke commented 1 year ago

We do this feat after decimal128 supported.