andyfengHKU commented 1 year ago

This issue keeps track of features that are either missing or implemented with a different semantic compared to Neo4j Cypher. The general goal is to make Kùzu feature complete towards Cypher.

Data Types

Kùzu takes Postgres data type system as a reference.

`LIST` Data Type

Neo4j's LIST allows arbitrary elements type. Kùzu, on the other hand, requires LIST elements to have the same type. Kùzu enforces these type constraints for compression and fast-processing purpose. To support arbitrary type, we can add UNION data type which can contain values of different data types. See DuckDB's UNION.

[x] Support UNION data type.
[x] Support LIST of UNION.

`MAP` Data Type

Kùzu currently does not support MAP, we should implement MAP as a LIST of STRUCT so that it can contain arbitrary number of key-value pairs where all keys must of the same type and all values must of the same type. To support key-value pairs with arbitrary types, we can again use UNION.

[x] Support MAP data type.

Spatial Data Types

Kùzu does not have native support for spatial data types.

Data Definition

Although Neo4j is schema-less, it does support index and constraints.

Constraints The following constraints are available in Neo4j but missing in Kùzu.

[ ] Unique constraint
[ ] Existence constraint

Index Kùzu supports primary index but does not support indexing arbitrary property.

[ ] Create Index
[ ] Drop Index

Query

Kùzu support OPTIONAL MATCH, RETURN, WITH, UNWIND, WHERE, ORDER BY, SKIP, LIMIT, UNION in the same way as Neo4j.

Match

In graph theory, trail means an edge cannot be repeatedly visited and walk means any vertex or edge can be repeatedly visited. Both semantic could be useful depends on the use case.

Neo4j's adopts trail semantic within a single MATCH clause. To achieve walk semantic Neo4j requires multiple MATCH clauses, e.g.

Trail
MATCH (a)-[e1]->(b)-[e2]->(c)

Walk
MATCH (a)-[e1]->(b)
MATCH (b)-[e2]->(c)

Kùzu on the other hand, use walk semantic by default and user can achieve trail semantic with predicates, e.g.

Walk
MATCH (a)-[e1]->(b)-[e2]->(c)

Trail
MATCH (a)-[e1]->(b)-[e2]->(c) WHERE id(e1) <> id(e2)

Kùzu has implemented the majority features in MATCH clause except

Return properties for recursive relationship

Kùzu currently only returns internal IDs for recursive relationship.

[x] Return properties for recursive relationship

Predicate on recursive relationship.

Kùzu currently does not support where predicate on recursive relationship

[x] Support predicate on recursive relationship.

Bounded recursive relationship

Kùzu requires recursive relationship to be bounded, i.e. with a lower bound and a upper bound (capped at 30), to avoid long running queries.

Named path

Kùzu does not support named path because we think most of the functionality of named path can be substituted with named nodes and (recursive) relationships. So implementing named path is not a priority for us.

All shortest path

Kùzu hasn't implemented all shortest path

[x] All shortest path

LOAD CSV

Kùzu uses COPY for bulk loading. Direct scan from CSV is not yet implemented.

[x] Support LOAD CSV

Subquery

Kùzu supports non-correlated EXISTS { subquery } .

[x] Support COUNT { subquery }
[ ] Support CALL { subquery }
[x] Support correlated subquery with un-nesting, i.e. avoid executing subquery with index nested loop join.

CALL procedure

Neo4j procedure seems to be a similar concept as SQLite PRAGMA statement which is used to modify / query non-table data.

[x] Support PRAGMA

Syntax sugar

The following syntax sugar does not affect functionality but could be good to have.

[x] omit relationship pattern, e.g. (a)->(b)
[ ] where predicate inside node/relationship pattern, e.g. (a:Person WHERE a.age > 20)
[x] use expression IN list instead of list_contains(list, expression)

The following syntax will not be implemented with priority

FOREACH clause. User can always UNWIND clause as a substitution.

Data Manipulation

Kùzu supports CREATE, DELETE and SET clause. However, our data manipulation works in s similar fashion as relational database like SQLite. So we don't support

Create a node or relationship with multiple labels In our storage, each label is mapped to a physical table, so a node / relationship record must explicitly belongs to a single physical table. For creating with multiple labels, e.g. CREATE (a:Person:Student), Kùzu doesn't have a way create to create a single record that exists in both Person and Student table.

Reading after updating in a single statement Kùzu does not yet support reading after updating in a single statement , e.g. MATCH (a) SET a.age = a.age + 1 RETURN a.age. One solution is to issue multiple statements.

[x] Support read after write in one statement

MERGE

MERGE is similar to UPSERT on INSERT .. ON CONFLICT as in many relational database. We are currently missing this feature.

[x] Support MERGE

Syntax sugar

The following syntax sugar does not affect functionality but could be good to have.

[x] Delete node / relationship with multiple labels. User can delete one label at a time.
[x] Set node / relationship properties with multiple labels. User can set one label at a time.
[x] DETECH DELETE. User can first delete all relationships then delete nodes.
[ ] REMOVE. User can use SET property = NULL

Functions

There is not a well-defined function set that a database should support. By default, Kùzu take Postgres and DuckDB's function set as a reference and will keep expanding on top of them. Let us know if any function is of interest to you but missing in Kùzu!

[x] Support UDF

Use Graph

We currently do not support multiple databases and query between different databases.

ubmarco commented 2 days ago

I am quite interested in the support for indexing arbitrary properties. For testing I loaded 90M objects so I ended up with a DB size of 12 GB. Query time on primary key values is 2ms while other properties need almost 700ms.

May I ask whether there is priority assigned to this topic?

andyfengHKU commented 2 days ago

Hi @ubmarco, I'll bring this up with the team today. I think @ray6080, @semihsalihoglu-uw and @prrao87 are in better position to comment on this feature.

prrao87 commented 1 day ago

Part of this is an open issue (range index): https://github.com/kuzudb/kuzu/issues/3776 We will have to discuss internally on how/when we can prioritize it among the other items in our list 😅

kuzudb / kuzu

Feature Complete Towards openCypher #1644

Data Types

`LIST` Data Type

`MAP` Data Type

Spatial Data Types

Data Definition

Query

Match

Return properties for recursive relationship

Predicate on recursive relationship.

Bounded recursive relationship

Named path

All shortest path

LOAD CSV

Subquery

CALL procedure

Syntax sugar

Data Manipulation

MERGE

Syntax sugar

Functions

Use Graph

kuzudb / kuzu

Feature Complete Towards openCypher #1644

Data Types

LIST Data Type

MAP Data Type

Spatial Data Types

Data Definition

Query

Match

Return properties for recursive relationship

Predicate on recursive relationship.

Bounded recursive relationship

Named path

All shortest path

LOAD CSV

Subquery

CALL procedure

Syntax sugar

Data Manipulation

MERGE

Syntax sugar

Functions

Use Graph

`LIST` Data Type

`MAP` Data Type