shamim8888 / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

ASTERIX should allow user to give additional information on a dataset in form of hints. #251

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
ASTERIX should allow user to give additional information in form of hints. 
These hints can come handy in scenarios such as:-  
a)  determining the optimal cardinality for a dataset’s nodegroup.   
b)  determining other parameters like the size of bloom filter to hold data. 

Given below are the hints that need to be supported.
SNo Hint    Description
1.  TUPLE_SIZE  Expected tuple size in bytes
2.  NUM_TUPLES  Expected number of tuples in the dataset.

An example create dataset statement that provides hints is given below:-

create dataset X(TypeY)
partitioned by key id 
hints (TUPLE_SIZE=250, NUM_TUPLES=250000) ;

Original issue reported on code.google.com by RamanGro...@gmail.com on 29 Jan 2013 at 3:33

GoogleCodeExporter commented 9 years ago
I would like to alter the proposed syntax slightly:-

create dataset <IDENTIFIER> ( <IDENTIFIER> )
partitioned by key <IDENTIFIER> (,<IDENTIFIER>)* 
hints (IDENTIFIER=<STRING_LITERAL> (,<IDENTIFIER>=<STRING_LITERAL>)*);

so that we have:-
create dataset X(TypeY)
partitioned by key id 
hints (TUPLE_SIZE="250", NUM_TUPLES="250000") ;

This is for supporting hints with arbitrary value, not necessarily complying 
with the definition of an IDENTIFIER. 

Original comment by RamanGro...@gmail.com on 29 Jan 2013 at 4:28

GoogleCodeExporter commented 9 years ago
Looks okay.  Let's be careful though - I want to REGULATE these hints HIGHLY.  
The only one I am okay with initially is expected cardinality.  Period.  Users 
have no way of estimating bytes in tuples.  And we don't even have tuples.  
CARDINALITY should be the name of our only initial hint, IMO.  We *MUST* avoid 
physical hints - that is a very nasty/slippery slope.  And hints are generally 
evil and hard to use.

Original comment by dtab...@gmail.com on 29 Jan 2013 at 8:44

GoogleCodeExporter commented 9 years ago
Where is the size of tuples used?  I know Bloom filter doesn't require this 
number.  If the user doesn't provide an estimation for this size, what can the 
engine do?

Original comment by che...@gmail.com on 29 Jan 2013 at 8:51

GoogleCodeExporter commented 9 years ago
Agreed with Mike. No TUPLE_SIZE and change NUM_TUPLES to CARDINALITY.

Original comment by vinay...@gmail.com on 29 Jan 2013 at 8:51

GoogleCodeExporter commented 9 years ago
The hint TUPLE_SIZE was originally proposed for helping in determining the 
cardinality of a node group for a dataset (when used in conjunction with the 
CARDINALITY_HINT).  
We want to hide the concept of a node group from the end-user. Asterix needs to 
form a node group and it cannot be the default (ALL_NODES) as it may not be 
optimal. In the upcoming release, we may not address this issue and rely on the 
default node group. However we do need to revisit this and have a better way of 
forming a node group for a dataset.  

It does not play any role for the current bloom filter issue. 

Original comment by ram...@uci.edu on 29 Jan 2013 at 10:23

GoogleCodeExporter commented 9 years ago
Committed into private branch: asterix_stabilization_issue_251

ASTERIX allows user to give additional information in form of hints. 
These hints can come handy in scenarios such as determining other parameters 
like the size of bloom filter to hold data. 

To begin with, the onyl hint supported by Asterix is the 'CARDINALITY' hint.
CARDINALITY gives the expected number of tuples in the dataset.

An example create dataset statement that provides hints is given below:-

create dataset X(TypeY)
partitioned by key id 
hints (CARDINALITY=2500);

Please note that hints are case-insensitive. 

Test Cases:-
Positive
asterix-app/src/test/resources/metadata/queries/basic/issue_251_dataset_hint_1.a
ql
asterix-app/src/test/resources/metadata/queries/basic/issue_251_dataset_hint_2.a
ql
asterix-app/src/test/resources/metadata/queries/basic/issue_251_dataset_hint_3.a
ql
asterix-app/src/test/resources/metadata/queries/basic/issue_251_dataset_hint_4.a
ql

Negative
asterix-app/src/test/resources/metadata/queries/exception/issue_251_dataset_hint
_error_1.aql
asterix-app/src/test/resources/metadata/queries/exception/issue_251_dataset_hint
_error_2.aql

Original comment by RamanGro...@gmail.com on 30 Jan 2013 at 5:35

GoogleCodeExporter commented 9 years ago
From the commit I see from the tests that the HINT is in lower case in some 
tests, like in the following two tests.

asterix-app/src/test/resources/metadata/queries/basic/issue_251_dataset_hint_1.a
ql

create dataset Book(LineType)
partitioned by key id
hints(  cardinality  =   2000);

asterix-app/src/test/resources/metadata/queries/basic/issue_251_dataset_hint_2.a
ql

create dataset Book(LineType)
partitioned by key id
hints(cardinality=2000);

Whereas in this test the HINT is in uppercase
asterix-app/src/test/resources/metadata/queries/basic/issue_251_dataset_hint_3.a
ql

Which is the correct (supported) behavior ? uppercase/lowercase ?

Original comment by khfaraaz82 on 30 Jan 2013 at 7:08

GoogleCodeExporter commented 9 years ago
What is the maximum size (number) that user can specify for cardinality in the 
HINT ? And what is the expected behavior if user gives a very high cardinality, 
for example MAX_VALUE of long type in Java ? I assume negative cardinality will 
be handled appropriately.

Original comment by khfaraaz82 on 30 Jan 2013 at 7:13

GoogleCodeExporter commented 9 years ago
As stated in the checkin log, the hint name is case insensitive. 
The value  in case of the "cardinality" hint is an integer that lies in the 
allowed range 0 and INTEGER.MAX.
Negative value is not permitted.

The syntax for providing hints supports a comma separated set of key value 
pairs with key as hint name and value being the actual value. The value can be 
a string also, in which case it must be surrounded with double quotes.
To begin with we support only one kind of hint (cardinality). If any other hint 
is given appropriate error message is provided.

Original comment by RamanGro...@gmail.com on 30 Jan 2013 at 7:59

GoogleCodeExporter commented 9 years ago
Khurram, the use of different case (upper and lower) in the test cases is 
intentional.

Original comment by RamanGro...@gmail.com on 30 Jan 2013 at 8:02

GoogleCodeExporter commented 9 years ago
released for code review by Sattam.

Original comment by RamanGro...@gmail.com on 30 Jan 2013 at 7:02

GoogleCodeExporter commented 9 years ago
Fixed in asterix_stabilization: r1175

Original comment by ram...@uci.edu on 11 Feb 2013 at 3:40

GoogleCodeExporter commented 9 years ago

Original comment by RamanGro...@gmail.com on 11 Feb 2013 at 3:40