stoneatom / stonedb

StoneDB is an Open-Source MySQL HTAP and MySQL-Native DataBase for OLTP, Real-Time Analytics, a counterpart of MySQLHeatWave. (https://stonedb.io)
https://stonedb.io/
GNU General Public License v2.0
857 stars 140 forks source link

bug: single table Benchmark Q3.1 return incorrect result set #1474

Open davidshiz opened 1 year ago

davidshiz commented 1 year ago

Have you read the Contributing Guidelines on issues?

Please confirm if bug report does NOT exists already ?

Describe the problem

10G SSB data, return incorrect result set, 1G data, result is nomal

SELECT
    C_NATION,
    S_NATION,
    (LO_ORDERDATE DIV 10000) AS year,
    sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND LO_ORDERDATE  >= '19920101' AND LO_ORDERDATE   <= '19971231'
GROUP BY
    C_NATION,
    S_NATION,
    year
ORDER BY
    year ASC,
    revenue DESC;

The wrong result set has more NULL values than the normal result set, as follows:

+-----------+-----------+------+-------------+
| C_NATION  | S_NATION  | year | revenue     |
+-----------+-----------+------+-------------+
| JAPAN     | CHINA     | NULL |    41700979 |
| VIETNAM   | VIETNAM   | NULL |    33389848 |
| INDONESIA | VIETNAM   | NULL |    31322348 |
| VIETNAM   | JAPAN     | NULL |    26649419 |
| INDONESIA | INDONESIA | NULL |    19919886 |
| CHINA     | CHINA     | NULL |    19813769 |
| JAPAN     | VIETNAM   | NULL |    19483824 |
| CHINA     | INDONESIA | NULL |    18940395 |
| VIETNAM   | INDIA     | NULL |    17855888 |
| INDONESIA | CHINA     | NULL |    17511616 |
| CHINA     | INDIA     | NULL |    16803747 |
| INDIA     | INDONESIA | NULL |    15512362 |

Expected behavior

No response

How To Reproduce

No response

Environment

root@ub01:/stonedb57/install/bin# ./mysqld --version
./mysqld  Ver 5.7.36-StoneDB-v1.0.3 for Linux on x86_64 (build-)
build information as follow:
        Repository address: https://github.com/stoneatom/stonedb.git:stonedb-5.7-dev
        Branch name: stonedb-5.7-dev
        Last commit ID: 05db04de4
        Last commit time: Date:   Wed Mar 22 21:09:55 2023 +0800
        Build time: Date: Thu Mar 23 14:37:50 CST 2023

Are you interested in submitting a PR to solve the problem?

davidshiz commented 1 year ago

single table Q4.1 also has similar problem

SELECT
   (LO_ORDERDATE DIV 10000) AS year,
    C_NATION,
    sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND P_MFGR in ( 'MFGR#1' , 'MFGR#2')
GROUP BY
    year,
    C_NATION
ORDER BY
    year ASC,
    C_NATION ASC;
RingsC commented 1 year ago

What's the different between these two result sets?

davidshiz commented 1 year ago

What's the different between these two result sets? As above, the wrong result set has more NULL values

adofsauron commented 1 year ago

ACK

adofsauron commented 1 year ago

set global tianmu_groupby_parallel_degree=0;

adofsauron commented 1 year ago

SELECT
   lo_orderdate,
    lo_orderpriority,
    sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder
WHERE lo_shipmode = 'TRUCK' AND lo_orderdate <= 19980730 AND lo_orderpriority IN ('5-LOW')
GROUP BY
    lo_orderdate,
    lo_orderpriority
ORDER BY
    lo_orderdate ASC,
    lo_orderpriority ASC
LIMIT 10;
adofsauron commented 1 year ago

| top N (                                                                                                                                                                                           |
| | project (                                                                                                                                                                                       |
| | | project (                                                                                                                                                                                     |
| | | | group by (                                                                                                                                                                                  |
| | | | | project (                                                                                                                                                                                 |
| | | | | | select (                                                                                                                                                                                |
| | | | | | | table("sys"."lineorder") [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, "lineorder"."lo_revenue" NOT NULL, "lineorder"."lo_supplycost" NOT NULL, "li |
: neorder"."lo_shipmode" NOT NULL ] COUNT                                                                                                                                                           :
| | | | | | ) [ "lineorder"."lo_orderdate" NOT NULL <= int "19980730", "lineorder"."lo_shipmode" NOT NULL = char(10) "TRUCK", "lineorder"."lo_orderpriority" NOT NULL = char(15) "5-LOW" ]          |
| | | | | ) [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, bigint["lineorder"."lo_revenue" NOT NULL] NOT NULL as "%4"."%4", bigint["lineorder"."lo_supplycost" NOT |
:  NULL] NOT NULL as "%5"."%5", "sys"."sql_sub"("%4"."%4" NOT NULL, "%5"."%5" NOT NULL) NOT NULL as "%3"."%3" ]                                                                                     :
| | | | ) [ "lineorder"."lo_orderpriority" NOT NULL, "lineorder"."lo_orderdate" NOT NULL ] [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, "sys"."sum" no nil ("%3" |
: ."%3" NOT NULL) NOT NULL as "%2"."%2" ]                                                                                                                                                           :
| | | ) [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, "%2"."%2" NOT NULL as "profit" ]                                                                            |
| | ) [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, "profit" NOT NULL ] [ "lineorder"."lo_orderdate" ASC NOT NULL, "lineorder"."lo_orderpriority" ASC NOT NULL ]  |
| ) [ bigint "10" ]     
adofsauron commented 1 year ago

| project (                                                                                                                                                                                         |
| | top N (                                                                                                                                                                                         |
| | | group by (                                                                                                                                                                                    |
| | | | project (                                                                                                                                                                                   |
| | | | | table("sys"."lineorder") [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, "lineorder"."lo_revenue" NOT NULL, "lineorder"."lo_supplycost" NOT NULL ] COUNT  |
| | | | ) [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, bigint["lineorder"."lo_revenue" NOT NULL] NOT NULL as "%3"."%3", bigint["lineorder"."lo_supplycost" NOT N |
: ULL] NOT NULL as "%4"."%4", "sys"."sql_sub"("%3"."%3" NOT NULL, "%4"."%4" NOT NULL) NOT NULL as "%2"."%2" ]                                                                                       :
| | | ) [ "lineorder"."lo_orderpriority" NOT NULL, "lineorder"."lo_orderdate" NOT NULL ] [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, "sys"."sum" no nil ("%2"." |
: %2" NOT NULL) NOT NULL as "%1"."%1" ]                                                                                                                                                             :
| | ) [ bigint "10" ]                                                                                                                                                                               |
| ) [ "lineorder"."lo_orderdate" NOT NULL, "lineorder"."lo_orderpriority" NOT NULL, "%1"."%1" NOT NULL as "profit" ] 
adofsauron commented 1 year ago

THRnew Number of threads: 36

adofsauron commented 1 year ago

    Line 10: * 9    Thread 0x7ff9315a1700 (LWP 20073) "" BATgroup_internal (groups=0x7ff9315a0528, extents=0x7ff9315a0530, histo=0x0, b=0x7ff8e0007430, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
    Line 12:   10   Thread 0x7ff93119f700 (LWP 20075) "" BATgroup_internal (groups=0x7ff93119e528, extents=0x7ff93119e530, histo=0x0, b=0x7ff8e4000d30, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
    Line 16:   13   Thread 0x7ff93099b700 (LWP 20079) "" BATgroup_internal (groups=0x7ff93099a528, extents=0x7ff93099a530, histo=0x0, b=0x7ff8f0000f70, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
    Line 18:   14   Thread 0x7ff93079a700 (LWP 20080) "" BATgroup_internal (groups=0x7ff930799528, extents=0x7ff930799530, histo=0x0, b=0x7ff8c8e4fa80, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
    Line 22:   18   Thread 0x7ff92bbfd700 (LWP 20085) "" BATgroup_internal (groups=0x7ff92bbfc528, extents=0x7ff92bbfc530, histo=0x0, b=0x7ff924004e90, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
    Line 24:   19   Thread 0x7ff92b7fb700 (LWP 20087) "" BATgroup_internal (groups=0x7ff92b7fa528, extents=0x7ff92b7fa530, histo=0x0, b=0x7ff918001cb0, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
    Line 32:   24   Thread 0x7ff92abf5700 (LWP 20093) "" BATgroup_internal (groups=0x7ff92abf4528, extents=0x7ff92abf4530, histo=0x0, b=0x7ff8a0001820, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
    Line 43:   32   Thread 0x7ff92b9fc700 (LWP 20104) "" BATgroup_internal (groups=0x7ff92b9fb528, extents=0x7ff92b9fb530, histo=0x0, b=0x7ff8fc015400, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
    Line 48:   36   Thread 0x7ff92b1f8700 (LWP 20111) "" BATgroup_internal (groups=0x7ff92b1f7528, extents=0x7ff92b1f7530, histo=0x0, b=0x7ff8b8001b10, s=0x0, g=0x0, e=0x0, h=0x0, subsorted=false)
adofsauron commented 1 year ago

        for (r = 0; r < cnt; r++) {
            oid o = canditer_next(&ci);
            p = o - b->hseqbase;
            if ((v = bgrps[w[p]]) == 0xFF && ngrp < 256) {
                bgrps[w[p]] = v = (unsigned char) ngrp++;
                maxgrppos = r;
                if (extents)
                    exts[v] = o;
            }
            ngrps[r] = v;
            if (r > 0 && v < ngrps[r - 1])
                gn->tsorted = false;
            if (histo)
                cnts[v]++;
        }
adofsauron commented 1 year ago

                        gid = gids ? gids[i] - min : (oid) i; \
                        x = vals[ci->seq + i - seqb]; \
                        if (nil_if_empty && \
                            !(seen[gid >> 5] & (1U << (gid & 0x1F)))) { \
                            seen[gid >> 5] |= 1U << (gid & 0x1F); \
                            sums[gid] = 0;  \
                        }           \
                        sums[gid] += x;     \
adofsauron commented 1 year ago
  1. group and aggregation operations are separated to facilitate the use of a more streamlined data structure to carry the group, and facilitate the slicing of data

  2. The selection of group's data structure

  3. The selection of aggregated data structure

  4. Separation of threads and selection of critical sections

adofsauron commented 1 year ago

| {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "8246523.25"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "table": {
        "table_name": "lineitem",
        "access_type": "ALL",
        "rows_examined_per_scan": 32986053,
        "rows_produced_per_join": 32986051,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "1649312.65",
          "eval_cost": "6597210.21",
          "prefix_cost": "8246523.25",
          "data_read_per_join": "11G"
        },
        "used_columns": [
          "l_orderkey",
          "l_linenumber",
          "l_quantity"
        ]
      }
    }
  }
} |