Open andyfengHKU opened 1 year ago
I am quite interested in the support for indexing arbitrary properties. For testing I loaded 90M objects so I ended up with a DB size of 12 GB. Query time on primary key values is 2ms while other properties need almost 700ms.
May I ask whether there is priority assigned to this topic?
Hi @ubmarco, I'll bring this up with the team today. I think @ray6080, @semihsalihoglu-uw and @prrao87 are in better position to comment on this feature.
Part of this is an open issue (range index): https://github.com/kuzudb/kuzu/issues/3776 We will have to discuss internally on how/when we can prioritize it among the other items in our list 😅
This issue keeps track of features that are either missing or implemented with a different semantic compared to Neo4j Cypher. The general goal is to make Kùzu feature complete towards Cypher.
Data Types
Kùzu takes Postgres data type system as a reference.
LIST
Data TypeNeo4j's
LIST
allows arbitrary elements type. Kùzu, on the other hand, requiresLIST
elements to have the same type. Kùzu enforces these type constraints for compression and fast-processing purpose. To support arbitrary type, we can addUNION
data type which can contain values of different data types. See DuckDB'sUNION
.UNION
data type.LIST
ofUNION
.MAP
Data TypeKùzu currently does not support
MAP
, we should implementMAP
as aLIST
ofSTRUCT
so that it can contain arbitrary number of key-value pairs where all keys must of the same type and all values must of the same type. To support key-value pairs with arbitrary types, we can again useUNION
.MAP
data type.Spatial Data Types
Kùzu does not have native support for spatial data types.
Data Definition
Although Neo4j is schema-less, it does support index and constraints.
Constraints The following constraints are available in Neo4j but missing in Kùzu.
Index Kùzu supports primary index but does not support indexing arbitrary property.
Query
Kùzu support
OPTIONAL MATCH
,RETURN
,WITH
,UNWIND
,WHERE
,ORDER BY
,SKIP
,LIMIT
,UNION
in the same way as Neo4j.Match
In graph theory,
trail
means an edge cannot be repeatedly visited andwalk
means any vertex or edge can be repeatedly visited. Both semantic could be useful depends on the use case.Neo4j's adopts
trail
semantic within a singleMATCH
clause. To achievewalk
semantic Neo4j requires multipleMATCH
clauses, e.g.Kùzu on the other hand, use
walk
semantic by default and user can achievetrail
semantic with predicates, e.g.Kùzu has implemented the majority features in
MATCH
clause exceptReturn properties for recursive relationship
Kùzu currently only returns internal IDs for recursive relationship.
Predicate on recursive relationship.
Kùzu currently does not support where predicate on recursive relationship
Bounded recursive relationship
Kùzu requires recursive relationship to be bounded, i.e. with a lower bound and a upper bound (capped at 30), to avoid long running queries.
Named path
Kùzu does not support named path because we think most of the functionality of named path can be substituted with named nodes and (recursive) relationships. So implementing named path is not a priority for us.
All shortest path
Kùzu hasn't implemented all shortest path
LOAD CSV
Kùzu uses
COPY
for bulk loading. Direct scan from CSV is not yet implemented.LOAD CSV
Subquery
Kùzu supports non-correlated
EXISTS { subquery }
.COUNT { subquery }
CALL { subquery }
CALL procedure
Neo4j procedure seems to be a similar concept as SQLite
PRAGMA
statement which is used to modify / query non-table data.PRAGMA
Syntax sugar
The following syntax sugar does not affect functionality but could be good to have.
(a)->(b)
(a:Person WHERE a.age > 20)
expression IN list
instead oflist_contains(list, expression)
The following syntax will not be implemented with priority
FOREACH
clause. User can alwaysUNWIND
clause as a substitution.Data Manipulation
Kùzu supports
CREATE
,DELETE
andSET
clause. However, our data manipulation works in s similar fashion as relational database like SQLite. So we don't supportCreate a node or relationship with multiple labels In our storage, each label is mapped to a physical table, so a node / relationship record must explicitly belongs to a single physical table. For creating with multiple labels, e.g.
CREATE (a:Person:Student)
, Kùzu doesn't have a way create to create a single record that exists in bothPerson
andStudent
table.Reading after updating in a single statement Kùzu does not yet support reading after updating in a single statement , e.g.
MATCH (a) SET a.age = a.age + 1 RETURN a.age
. One solution is to issue multiple statements.MERGE
MERGE
is similar toUPSERT
onINSERT .. ON CONFLICT
as in many relational database. We are currently missing this feature.MERGE
Syntax sugar
The following syntax sugar does not affect functionality but could be good to have.
DETECH DELETE
. User can first delete all relationships then delete nodes.REMOVE
. User can useSET property = NULL
Functions
There is not a well-defined function set that a database should support. By default, Kùzu take Postgres and DuckDB's function set as a reference and will keep expanding on top of them. Let us know if any function is of interest to you but missing in Kùzu!
Use Graph
We currently do not support multiple databases and query between different databases.