simlaudato / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

NGRAM index entry not available in Metadata #797

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Reporting this issue for Francis, he has hit this when testing persistence of 
metadata entries for datasets, datatypes, functions, indexes etc that were 
created.

For more information on the asterixdb instance information etc please discuss 
with Francis.

case 1) ngram index entry not available in metadata.

create type closedStudentType as closed{
isUndergrad: boolean,
dormID: int8,
studentID: int16,
debitCardID: int32,
bankingID: int64,
salary: float,
tuition: double,
name: string,
dot: point,
straight: line,
box: rectangle,
round: circle,
shape: polygon,
birthday: date,
lastLogin: time,
enrollmentApproved: datetime,
suspension: duration,
calGrantDuration: year-month-duration,
enrollmentWindowLock: day-time-duration,
coursesTaken: {{int8}},
tuitionPaidDates: [datetime],
customType: openBooleanType
}

create dataset closedStudents(closedStudentType)
primary key studentID;

create index cStudentsIdx on closedStudents(name) type ngram(3);

for $ind in dataset Metadata.Index
where $ind.DatasetName = "closedStudents" and $ind.IndexName = "cStudentsIdx" 
and $ind.IndexStructure = "NGRAM"
return $ind;
Output : Query does not return any results.

Francis - this is for you to try and paste the output as a comment to this 
issue.

for $ind in dataset Metadata.Index
where $ind.DatasetName = "closedStudents" and $ind.IndexName = "cStudentsIdx"
return $ind;

case 2) Keyword index (does not seem like an issue), but please confirm.

create type closedStringOpType as closed {str: string, pid: int64?};

create dataset closedOpStrings(closedStringOpType)
primary key str;

create index cOpStringsIdx on closedOpStrings(str) type keyword;

for $ind in dataset Metadata.Index
where $ind.DatasetName = "closedOpStrings" and $ind.IndexName = "cOpStringsIdx" 
and $ind.IndexStructure = "KEYWORD"
return $ind;

I see that since we already have a primary index defined on field "str", we can 
not have another keyword index defined on the same field. The below output 
tells us we already have a BTREE index defined on field "str"

{ "DataverseName": "metatestcase", "DatasetName": "closedOpStrings", 
"DataTypeName": "closedStringOpType", "DatasetType": "INTERNAL", 
"InternalDetails": { "FileStructure": "BTREE", "PartitioningStrategy": "HASH", 
"PartitioningKey": [ "str" ], "PrimaryKey": [ "str" ], "GroupName": 
"DEFAULT_NG_ALL_NODES", "Autogenerated": false, "CompactionPolicy": "prefix", 
"CompactionPolicyProperties": [ { "Name": "max-mergable-component-size", 
"Value": "1073741824" }, { "Name": "max-tolernace-component-count", "Value": 
"5" } ] }, "ExternalDetails": null, "Hints": {{  }}, "Timestamp": "Mon Aug 11 
17:00:22 PDT 2014", "DatasetId": 472, "PendingOp": 0 }

Francis - please try this query and paste the output as a comment to this 
defect.

for $ind in dataset Metadata.Index
where $ind.DatasetName = "closedOpStrings" and $ind.IndexName = "cOpStringsIdx"
return $ind;

Original issue reported on code.google.com by khfaraaz82 on 17 Aug 2014 at 8:48

GoogleCodeExporter commented 8 years ago
for $ind in dataset Metadata.Index
where $ind.DatasetName = "closedStudents" and $ind.IndexName = "cStudentsIdx"
return $ind;
Output: { "DataverseName": "metatestcase", "DatasetName": "closedStudents", 
"IndexName": "cStudentsIdx", "IndexStructure": 
"LENGTH_PARTITIONED_NGRAM_INVIX", "SearchKey": [ "name" ], "IsPrimary": false, 
"Timestamp": "Mon Aug 11 17:04:36 PDT 2014", "PendingOp": 0, "GramLength": 3 }

for $ind in dataset Metadata.Index
where $ind.DatasetName = "closedOpStrings" and $ind.IndexName = "cOpStringsIdx"
return $ind;
{ "DataverseName": "metatestcase", "DatasetName": "closedOpStrings", 
"IndexName": "cOpStringsIdx", "IndexStructure": 
"LENGTH_PARTITIONED_WORD_INVIX", "SearchKey": [ "str" ], "IsPrimary": false, 
"Timestamp": "Mon Aug 11 17:04:37 PDT 2014", "PendingOp": 0 }

Original comment by franc...@uci.edu on 17 Aug 2014 at 9:08

GoogleCodeExporter commented 8 years ago
SO - the metadata is there afterall - the test queries need to be updated - but 
we do have a documentation issue here.  (We need to explain in the manual what 
the IndexStructure types are.)

Original comment by dtab...@gmail.com on 19 Aug 2014 at 12:47

GoogleCodeExporter commented 8 years ago
The inverted index types should be one of the followings: 
SINGLE_PARTITION_WORD_INVIX, SINGLE_PARTITION_NGRAM_INVIX, 
LENGTH_PARTITIONED_WORD_INVIX, LENGTH_PARTITIONED_NGRAM_INVIX. We indicate if 
the index is partitioned or not via this type info in the Metadata. Currently 
we removed the support for single partition index; but it may be added back 
when Taewoo implements fulltext search. I'm tagging this issue as invalid.

Do we have a place in the documentation that we explain the internals of 
Metadata?

Original comment by icetin...@gmail.com on 19 Aug 2014 at 12:58

GoogleCodeExporter commented 8 years ago

Original comment by icetin...@gmail.com on 19 Aug 2014 at 1:43