Closed DreamKetchup closed 8 years ago
Hi,
Use the --explain option of run_query to see what Shard-Query is doing with the seven shards. It sounds like not all of the shards have the shard key in them. Shard-Query will pick one of the shards as the "coordinator" shard, and the metadata for tables (such as what columns are in the table) is retrieved from that shard. If a table doesn't have the shard key, that means the table should be duplicated on each node. Thus a query from it should be sent to only one in that case, because to send it to more than one would result in duplicate rows.
You can't install the UDF with ICE. You will need to set up a regular MySQL database, create a database with the same name as your shard schema (it doesn't need to contain any tables), install the UDF, then use either the proxy or sq_helper.
I hope this helps,
--Justin
On Thu, Oct 15, 2015 at 6:39 AM, DreamKetchup notifications@github.com wrote:
Hello, I am new to shard-query and have some problems with it. I have a simple star schema with 7 shards(for each day of the week). On diffrent queries i receive data only from my first shard. Ive created simple test schema with 3 shards and tried same queries and everything worked fine. So now i'm really confused and don't know what can be the source of the problem.
Second problem with Infobright CE UDF installation.
mysql> CREATE FUNCTION gman_do RETURNS STRING -> SONAME "libgearman_mysql_udf.so";
ERROR 1126 (HY000): Can't open shared library 'libgearman_mysql_udf.so' (errno: 0 feature disabled) Using Ubuntu 14.04 and Infobright CE v. 4.0.7
Please advise me how to resolve these problems.
— Reply to this email directly or view it on GitHub https://github.com/greenlion/swanhart-tools/issues/133.
Hello Justin, Ok now i've noticed some strange behaviour. After installing new schemata. All shards are marked as coord_shard in shard-query GUI. But in query I'm choosing as coord shard only1st shard. Result from queries can be diffrent every time, sometimes from 1st shard,2nd so on. It shows only result from one shard. If i unmark all except 1st shard as coord_shard, then result will be only from 1st shard. I think i messed up with something
I'll try to describe the whole picture in more detail. This is a little piece of my fact table that i'm sharding. day_id is my shard column. All other dimensions are duplicated in each shard. In 1st shard also have some temporary tables, they are not used in reporting, so i did not duplicate them.
So here is my shards
I hope I have configured them correctly.
This is the query:
select
object_dim
.host_name
as c0
,
day_dim
.dayOfweekDesc
as c2
,
sum(state_fact
.value
) as m0
from
object_dim
as object_dim
,
state_fact
as state_fact
,
day_dim
as day_dim
where
state_fact
.object_id
= object_dim
.object_id
and
object_dim
.host_name
= 'DECT_GebF_3
'
and
state_fact
.day_id
= day_dim
.day_id
group by
object_dim
.host_name
,
day_dim
.dayOfweekDesc
_runquery --verbose showed me this:
Shard-Query optimizer messages:
SQL TO SEND TO SHARDS:
Array
(
[0] => SELECT object_dim
.host_name
AS
expr$0,day_dim
.dayOfweekDesc
AS expr$1,SUM(state_fact.value) AS
expr_2617061748
FROM object_dim
AS object_dim
CROSS JOIN state_fact
AS
state_fact
CROSS JOIN day_dim
AS day_dim
WHERE 1=1 AND
state_fact
.object_id
= object_dim
.object_id
and
object_dim
.host_name
= 'DECT_GebF_3
' and `state_fact`.`day_id` = `day_dim`.`day_id` GROUP BY expr$0,expr$1
)
SQL TO SEND TO COORDINATOR NODE:
SELECT expr$0 AS c0
,expr$1 AS c2
,SUM(expr_2617061748) AS m0
FROM aggregation_tmp_69434312
GROUP BY expr$0,expr$1
Array ( [c0] => DECT_GebF_3 [c2] => Sunday [m0] => 128160 ) 1 rows returned Exec time: 106.28450512886 INSTRUMENATION: Array ( [mysql_query_count] => 0 [mysql_prepare_count] => 0 [mysql_prepare_time] => 0 [mysql_connection_count] => 0 [mysql_connect_time] => 0 [mysql_query_exec_time] => 0 [mysql_deadlock_count] => 0 [php_service_time] => 1444997607.8773 [memory_usage] => 1532496 [cpu_user] => 20000 [cpu_system] => 4000 )
Only one result from the 1st shard. I don't see here that shard-query send query to other shards.
I hope you can explain me what i'm doing wrong.
Thanks in advance
Hi,
Do any of the tables in the query have the shard key in them? If not, the query goes to only one node.
Let me see your .ini file and a show create table for each table involved.
On Fri, Oct 16, 2015 at 6:00 AM, Aleksandr notifications@github.com wrote:
Hello Justin, Ok now i've noticed some strange behaviour. After installing new schemata. All shards are marked as coord_shard in shard-query GUI. But in query I'm choosing as coord shard only1st shard. Result from queries can be diffrent every time, sometimes from 1st shard,2nd so on. It shows only result from one shard. If i unmark all except 1st shard as coord_shard, then result will be only from 1st shard. I think i messed up with something
I'll try to describe the whole picture in more detail. This is a little piece of my fact table that i'm sharding. day_id is my shard column. All other dimensions are duplicated in each shard. In 1st shard also have some temporary tables, they are not used in reporting, so i did not duplicate them.
[image: Встроенное изображение 1]
So here is my shards
[image: Встроенное изображение 2]
I hope I have configured them correctly.
This is the query:
select
object_dim
.host_name
asc0
,day_dim
.dayOfweekDesc
asc2
, sum(state_fact
.value
) asm0
fromobject_dim
asobject_dim
,state_fact
asstate_fact
,day_dim
asday_dim
wherestate_fact
.object_id
=object_dim
.object_id
andobject_dim
.host_name
= 'DECT_GebF_3' and
state_fact
.day_id
=day_dim
.day_id
group byobject_dim
.host_name
,day_dim
.dayOfweekDesc
_runquery --verbose showed me this:
Shard-Query optimizer messages:
- The following projections may be selected for a UNIQUE CHECK on the storage node operation: expr$0,expr$1
- storage node result set merge optimization enabled: ON DUPLICATE KEY UPDATE expr_2617061748=expr_2617061748 + VALUES(expr_2617061748)
SQL TO SEND TO SHARDS: Array ( [0] => SELECT
object_dim
.host_name
AS expr$0,day_dim
.dayOfweekDesc
AS expr$1,SUM(state_fact.value) AS expr_2617061748 FROMobject_dim
ASobject_dim
CROSS JOINstate_fact
ASstate_fact
CROSS JOINday_dim
ASday_dim
WHERE 1=1 ANDstate_fact
.object_id
=object_dim
.object_id
andobject_dim
.host_name
= 'DECT_GebF_3' and
state_fact
.day_id
=day_dim
.day_id
GROUP BY expr$0,expr$1 ) SQL TO SEND TO COORDINATOR NODE: SELECT expr$0 ASc0
,expr$1 ASc2
,SUM(expr_2617061748) ASm0
FROMaggregation_tmp_69434312
GROUP BY expr$0,expr$1Array ( [c0] => DECT_GebF_3 [c2] => Sunday [m0] => 128160 ) 1 rows returned Exec time: 106.28450512886 INSTRUMENATION: Array ( [mysql_query_count] => 0 [mysql_prepare_count] => 0 [mysql_prepare_time] => 0 [mysql_connection_count] => 0 [mysql_connect_time] => 0 [mysql_query_exec_time] => 0 [mysql_deadlock_count] => 0 [php_service_time] => 1444997607.8773 [memory_usage] => 1532496 [cpu_user] => 20000 [cpu_system] => 4000 )
Only one result from the 1st shard. I don't see here that shard-query send query to other shards.
I hope you can explain me what i'm doing wrong.
Thanks in advance
2015-10-16 3:46 GMT+02:00 Justin Swanhart notifications@github.com:
Hi,
Use the --explain option of run_query to see what Shard-Query is doing with the seven shards. It sounds like not all of the shards have the shard key in them. Shard-Query will pick one of the shards as the "coordinator" shard, and the metadata for tables (such as what columns are in the table) is retrieved from that shard. If a table doesn't have the shard key, that means the table should be duplicated on each node. Thus a query from it should be sent to only one in that case, because to send it to more than one would result in duplicate rows.
You can't install the UDF with ICE. You will need to set up a regular MySQL database, create a database with the same name as your shard schema (it doesn't need to contain any tables), install the UDF, then use either the proxy or sq_helper.
I hope this helps,
--Justin
On Thu, Oct 15, 2015 at 6:39 AM, DreamKetchup notifications@github.com wrote:
Hello, I am new to shard-query and have some problems with it. I have a simple star schema with 7 shards(for each day of the week). On diffrent queries i receive data only from my first shard. Ive created simple test schema with 3 shards and tried same queries and everything worked fine. So now i'm really confused and don't know what can be the source of the problem.
Second problem with Infobright CE UDF installation.
mysql> CREATE FUNCTION gman_do RETURNS STRING -> SONAME "libgearman_mysql_udf.so";
ERROR 1126 (HY000): Can't open shared library 'libgearman_mysql_udf.so' (errno: 0 feature disabled) Using Ubuntu 14.04 and Infobright CE v. 4.0.7
Please advise me how to resolve these problems.
— Reply to this email directly or view it on GitHub https://github.com/greenlion/swanhart-tools/issues/133.
— Reply to this email directly or view it on GitHub < https://github.com/greenlion/swanhart-tools/issues/133#issuecomment-148572376
.
— Reply to this email directly or view it on GitHub https://github.com/greenlion/swanhart-tools/issues/133#issuecomment-148710662 .
Hi, Do i need to have shard_key in all dimensions or only in fact_table??? Before i had only in fact_table and all other dimensions were only duplicated in each shard.
Anyway i tried and added shard_key to each table that i'm using in query. So my tables look like this.
CREATE TABLE IF NOT EXISTS state_fact
(
object_id
int(11) NOT NULL,
time_id
int(11) ,
date_id
int(11) NOT NULL,
value_id
int(11),
state_type_id
int(11),
value
float(11) DEFAULT NULL,
day_id
tinyint(2) DEFAULT NULL,
interval_id
int(11),
shard_key
int(3) DEFAULT 1
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS object_dim
(
object_id
int(11) NOT NULL,
host_name
varchar(128),
service_name
varchar(200),
uuid_host
varchar(128),
uuid_service
varchar(128),
is_active
smallint(6),
shard_key
int(3) DEFAULT 1
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS day_dim
;
CREATE TABLE IF NOT EXISTS day_dim
(
day_id
int(11) NOT NULL,
dayOfweekDesc
varchar(10) NOT NULL,
dayOfweek
tinyint(2) NOT NULL,
shard_key
int(3) DEFAULT 1
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Here i showed only 1st shard other differ only by shard_key value.
So here is my bootsrap.ini [default] user=root password=init
[config]
db=shard_query host=127.0.0.1
schema_name=1_monat
mapper=directory
column=shard_key column_datatype=integer
is_default_schema=1
inlist=false between=false
inlist_merge_threshold=128
inlist_merge_size=64
coord_engine=MYISAM
gearmand_path=/usr/sbin
[shard1] db=shard_1 host=127.0.0.1
[shard2] db=shard_2 host=127.0.0.1
[shard3] db=shard_3 host=127.0.0.1
[shard4] db=shard_4 host=127.0.0.1
[shard5] db=shard_5 host=127.0.0.1
[shard6] db=shard_6 host=127.0.0.1
[shard7] db=shard_7 host=127.0.0.1
I still receive data only from 1st shard.
You only put the shard key in the fact table. if you do select count(_) from facttable it will scan all shards. If a table does not have the shard key in it, is must be duplicated in all shards. a select count() of the table will scan only one shard.
Keep in mind that if it only show one SQL in the explain that does NOT mean only one shard is being scanned. It just means that the same SQL is being sent to all necessary shards, and that might be only one shard.
Hi,
I found the problem.
select
object_dim
.host_name
as c0
,
day_dim
.dayOfweekDesc
as c2
,
sum(state_fact
.value
) as m0
from
object_dim
as object_dim
,
state_fact
as state_fact
,
day_dim
as day_dim
where
state_fact
.object_id
= object_dim
.object_id
and
object_dim
.host_name
= \'DECT_GebF_3 \'
and
state_fact
.day_id
= day_dim
.day_id
group by
object_dim
.host_name
,
day_dim
.dayOfweekDesc
With this query i receive data only from 1st shard. But if i delete all
then everything works fine.
Can you check this issue?
Hi, please send CREATE TABLE for each involved table.
Sent from my iPhone
On Nov 6, 2015, at 3:43 AM, Aleksandr notifications@github.com wrote:
Hi, I found the problem. select object_dim.host_name as c0, day_dim.dayOfweekDesc as c2, sum(state_fact.value) as m0 from object_dim as object_dim, state_fact as state_fact, day_dim as day_dim where state_fact.object_id = object_dim.object_id and object_dim.host_name = 'DECT_GebF_3 ' and state_fact.day_id = day_dim.day_id group by object_dim.host_name, day_dim.dayOfweekDesc
With this query i receive data only from 1st shard. But if i delete all then everything works fine. Can you check this issue?
— Reply to this email directly or view it on GitHub.
Sorry, I see them in earlier email. I will look into it this weekend.
Sent from my iPhone
On Nov 6, 2015, at 3:43 AM, Aleksandr notifications@github.com wrote:
Hi, I found the problem. select object_dim.host_name as c0, day_dim.dayOfweekDesc as c2, sum(state_fact.value) as m0 from object_dim as object_dim, state_fact as state_fact, day_dim as day_dim where state_fact.object_id = object_dim.object_id and object_dim.host_name = 'DECT_GebF_3 ' and state_fact.day_id = day_dim.day_id group by object_dim.host_name, day_dim.dayOfweekDesc
With this query i receive data only from 1st shard. But if i delete all then everything works fine. Can you check this issue?
— Reply to this email directly or view it on GitHub.
Hello again, I have some problems with ICE. I don't know how to run a query. I created an empty schema with the name that i have in sq. Then installed sq_helper but don't know how to run a query. Can i run it from webGUI ??? Can you exlain this in details.
sq_helper won't work with ICE. you need to use the proxy. The web gui has not been updated in awhile. You might want to ask @hurdad for info about it.
Hello, I am new to shard-query and have some problems with it. I have a simple star schema with 7 shards(for each day of the week). On diffrent queries i receive data only from my first shard. Ive created simple test schema with 3 shards and tried same queries and everything worked fine. So now i'm really confused and don't know what can be the source of the problem.
Second problem with Infobright CE UDF installation.
mysql> CREATE FUNCTION gman_do RETURNS STRING -> SONAME "libgearman_mysql_udf.so";
ERROR 1126 (HY000): Can't open shared library 'libgearman_mysql_udf.so' (errno: 0 feature disabled) Using Ubuntu 14.04 and Infobright CE v. 4.0.7
Please advise me how to resolve these problems.