Open tylarb opened 4 years ago
Here is a healthy run, with counts as expected:
select count(*) as warehouses from warehouse;
warehouses
------------
100
(1 row)
select count ( distinct d_w_id ) as warehouses, count(*) as districts from district;
warehouses | districts
------------+-----------
100 | 1000
(1 row)
select count ( distinct c_w_id ) as warehouses, count(*) as customers from customer;
warehouses | customers
------------+-----------
100 | 3000000
(1 row)
;
select count ( distinct s_w_id ) as warehouses, count(*) as stocks from stock;
warehouses | stocks
------------+---------
100 | 4920832
(1 row)
select count ( distinct h_w_id ) as warehouses, count(*) as history from history;
warehouses | history
------------+---------
100 | 3000000
(1 row)
select count ( distinct o_w_id ) as warehouses, count(*) as orders from oorder;
warehouses | orders
------------+---------
100 | 3000000
(1 row)
select count ( distinct no_w_id ) as warehouses, count(*) as new_orders from new_order;
warehouses | new_orders
------------+------------
100 | 900000
(1 row)
Here's an unhealty connection, from the node in different region:
{$some different smaller node}$ time ./tpccbenchmark --create=true --load=true --nodes=172.151.59.4,172.151.49.178,172.151.55.80 --warehouses=100 --loaderthreads 48
yugabyte=# select count(*) as warehouses from warehouse;
warehouses
------------
100
(1 row)
yugabyte=# select count ( distinct d_w_id ) as warehouses, count(*) as districts from district;
warehouses | districts
------------+-----------
100 | 1000
(1 row)
yugabyte=# select count ( distinct c_w_id ) as warehouses, count(*) as customers from customer;
warehouses | customers
------------+-----------
100 | 3000000
(1 row)
yugabyte=#
yugabyte=# select count ( distinct s_w_id ) as warehouses, count(*) as stocks from stock;
warehouses | stocks
------------+---------
100 | 7281152
(1 row)
yugabyte=# select count ( distinct h_w_id ) as warehouses, count(*) as history from history;
warehouses | history
------------+---------
100 | 3000000
(1 row)
yugabyte=#
yugabyte=# select count ( distinct o_w_id ) as warehouses, count(*) as orders from oorder;
warehouses | orders
------------+---------
100 | 2978220
(1 row)
yugabyte=#
yugabyte=# select count ( distinct no_w_id ) as warehouses, count(*) as new_orders from new_order;
warehouses | new_orders
------------+------------
100 | 892920
(1 row)
Here's the same data, this time from a tpcc with 8 threads (other values the same). The performance was abysmal - over 2 hours to completion, compared to about 17 min running locally, and plenty of errors. So I am expecting them to be related.
yugabyte=# select count(*) as warehouses from warehouse;
warehouses
------------
100
(1 row)
yugabyte=# select count ( distinct c_w_id ) as warehouses, count(*) as customers from customer;
warehouses | customers
------------+-----------
100 | 3000000
(1 row)
yugabyte=# select count ( distinct s_w_id ) as warehouses, count(*) as stocks from stock;
warehouses | stocks
------------+----------
100 | 10000000
(1 row)
yugabyte=# select count ( distinct h_w_id ) as warehouses, count(*) as history from history;
warehouses | history
------------+---------
100 | 3000000
(1 row)
yugabyte=# select count ( distinct o_w_id ) as warehouses, count(*) as orders from oorder;
warehouses | orders
------------+---------
100 | 2860457
(1 row)
yugabyte=# select count ( distinct no_w_id ) as warehouses, count(*) as new_orders from new_order;
warehouses | new_orders
------------+------------
100 | 856106
(1 row)
This time, we're well under-valued for orders and new_orders compared to local - 5% under.
When I run a TPCC load from a node that is undersized and in a different zone as the cluster. I get data less than expected to be loaded. Given 100 warehouses, I expect some number of orders = 30k times number of warehouses, new_orders = 90k, order_lines=30k.
These numbers, given random inserts, will have some variability, in the realm of 1-2%, decreasing as load/threads goes up.
Data representing this will be uploaded shortly.
Here is the cluster setup and tpcc load module: cluster:
Node which tpcc is running from:
Issue happens with small numbers of threads. Tested and failing with 4, 8, 48 threads
100 warehouses
Command for example:
The tpcc benchmark success but with the above errors.