kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.36k stars 96 forks source link

Feature Complete Towards openCypher #1644

Open andyfengHKU opened 1 year ago

andyfengHKU commented 1 year ago

This issue keeps track of features that are either missing or implemented with a different semantic compared to Neo4j Cypher. The general goal is to make Kùzu feature complete towards Cypher.

Data Types

Kùzu takes Postgres data type system as a reference.

LIST Data Type

Neo4j's LIST allows arbitrary elements type. Kùzu, on the other hand, requires LIST elements to have the same type. Kùzu enforces these type constraints for compression and fast-processing purpose. To support arbitrary type, we can add UNION data type which can contain values of different data types. See DuckDB's UNION.

MAP Data Type

Kùzu currently does not support MAP, we should implement MAP as a LIST of STRUCT so that it can contain arbitrary number of key-value pairs where all keys must of the same type and all values must of the same type. To support key-value pairs with arbitrary types, we can again use UNION.

Spatial Data Types

Kùzu does not have native support for spatial data types.

Data Definition

Although Neo4j is schema-less, it does support index and constraints.

Constraints The following constraints are available in Neo4j but missing in Kùzu.

Index Kùzu supports primary index but does not support indexing arbitrary property.

Query

Kùzu support OPTIONAL MATCH, RETURN, WITH, UNWIND, WHERE, ORDER BY, SKIP, LIMIT, UNION in the same way as Neo4j.

Match

In graph theory, trail means an edge cannot be repeatedly visited and walk means any vertex or edge can be repeatedly visited. Both semantic could be useful depends on the use case.

Neo4j's adopts trail semantic within a single MATCH clause. To achieve walk semantic Neo4j requires multiple MATCH clauses, e.g.

Trail
MATCH (a)-[e1]->(b)-[e2]->(c)
Walk
MATCH (a)-[e1]->(b)
MATCH (b)-[e2]->(c)

Kùzu on the other hand, use walk semantic by default and user can achieve trail semantic with predicates, e.g.

Walk
MATCH (a)-[e1]->(b)-[e2]->(c)
Trail
MATCH (a)-[e1]->(b)-[e2]->(c) WHERE id(e1) <> id(e2)

Kùzu has implemented the majority features in MATCH clause except

Return properties for recursive relationship

Kùzu currently only returns internal IDs for recursive relationship.

Predicate on recursive relationship.

Kùzu currently does not support where predicate on recursive relationship

Bounded recursive relationship

Kùzu requires recursive relationship to be bounded, i.e. with a lower bound and a upper bound (capped at 30), to avoid long running queries.

Named path

Kùzu does not support named path because we think most of the functionality of named path can be substituted with named nodes and (recursive) relationships. So implementing named path is not a priority for us.

All shortest path

Kùzu hasn't implemented all shortest path

LOAD CSV

Kùzu uses COPY for bulk loading. Direct scan from CSV is not yet implemented.

Subquery

Kùzu supports non-correlated EXISTS { subquery } .

CALL procedure

Neo4j procedure seems to be a similar concept as SQLite PRAGMA statement which is used to modify / query non-table data.

Syntax sugar

The following syntax sugar does not affect functionality but could be good to have.

The following syntax will not be implemented with priority

Data Manipulation

Kùzu supports CREATE, DELETE and SET clause. However, our data manipulation works in s similar fashion as relational database like SQLite. So we don't support

Create a node or relationship with multiple labels In our storage, each label is mapped to a physical table, so a node / relationship record must explicitly belongs to a single physical table. For creating with multiple labels, e.g. CREATE (a:Person:Student), Kùzu doesn't have a way create to create a single record that exists in both Person and Student table.

Reading after updating in a single statement Kùzu does not yet support reading after updating in a single statement , e.g. MATCH (a) SET a.age = a.age + 1 RETURN a.age. One solution is to issue multiple statements.

MERGE

MERGE is similar to UPSERT on INSERT .. ON CONFLICT as in many relational database. We are currently missing this feature.

Syntax sugar

The following syntax sugar does not affect functionality but could be good to have.

Functions

There is not a well-defined function set that a database should support. By default, Kùzu take Postgres and DuckDB's function set as a reference and will keep expanding on top of them. Let us know if any function is of interest to you but missing in Kùzu!

Use Graph

We currently do not support multiple databases and query between different databases.

ubmarco commented 2 days ago

I am quite interested in the support for indexing arbitrary properties. For testing I loaded 90M objects so I ended up with a DB size of 12 GB. image Query time on primary key values is 2ms while other properties need almost 700ms.

May I ask whether there is priority assigned to this topic?

andyfengHKU commented 2 days ago

Hi @ubmarco, I'll bring this up with the team today. I think @ray6080, @semihsalihoglu-uw and @prrao87 are in better position to comment on this feature.

prrao87 commented 1 day ago

Part of this is an open issue (range index): https://github.com/kuzudb/kuzu/issues/3776 We will have to discuss internally on how/when we can prioritize it among the other items in our list 😅