Open ubmarco opened 1 month ago
Hi @ubmarco yeah this is indeed a known issue we haven't fixed yet. thanks for pointing it out! Will get this fixed as soon as we can.
Meanwhile let me try to gather some feedback on this feature here: is your use case relying on the multiplicity constraint? If so, is limiting the constraint as either one or many good enough? Thanks!
Hi @ray6080, thanks for the feedback.
For my use case, the existing constraints
ONE_ONE
ONE_MANY
MANY_ONE
MANY_MANY
are enough for now.
But I can certainly think about situations in which more complex constraints make total sense. E.g. a user can only be registered in max. 2 cities. Or a user might only want to live in a city where the population is less than 100k. This goes to application business logic and I'm not sure whether it even makes sense to define it in the DB. Basically it would mean, before altering a REL TABLE make sure certain Cypher queries are still valid. Having this on DB side means transactional safety as the query is only committed if the constraint queries after inserting return no elements.
Another point that I have in mind are REL TABLE GROUP
s. My experiments show that the relationship can only be given once for the whole group. In the following example
CREATE NODE TABLE User(id STRING, PRIMARY KEY (id));
CREATE NODE TABLE Dog(id STRING, PRIMARY KEY (id));
CREATE NODE TABLE City(id STRING, PRIMARY KEY (id));
CREATE REL TABLE GROUP Knows (FROM User To City, FROM Dog to City, MANY_MANY);
it means I cannot reuse a relationship label in case the multiplicity differs.
It would be good if I can specify the multiplicity per relation, e.g.
CREATE NODE TABLE User(id STRING, PRIMARY KEY (id));
CREATE NODE TABLE Dog(id STRING, PRIMARY KEY (id));
CREATE NODE TABLE City(id STRING, PRIMARY KEY (id));
CREATE REL TABLE GROUP Knows (FROM User To City, MANY_MANY, FROM Dog to City, MANY_ONE);
So a User may know many cities but a Dog can only know one. Not sure whether my syntax proposal makes sense to you :)
A bit unrelated, but here is a question towards the usage of Cypher for DDL. The Kuzu DDL seems not to be not part of the OpenCypher9 specification. Is this a custom language addition to Cypher?
I just find the DROP
statement in the section Reserved for future use
.
Just wondering whether there are standardized ideas around this topic. I see there is also the CONSTRAINT
in the section Reserved for future use
, so maybe a constraint definition concept outside of REL TABLE [GROUP]
would make sense?
Hi @ubmarco , sorry for the late reply. It's great we can gather your feedback on this and thanks for your suggestions!
it means I cannot reuse a relationship label in case the multiplicity differs. It would be good if I can specify the multiplicity per relation, e.g.
Yeah, this is indeed one limitation of rel table group now. What you suggest definitely can be an option to support. But I feel it's hard to keep the syntax succinct while allowing the semantic expression to be more flexible. As pointed out in your other comment, maybe defining the CONSTRAINT
outside of the CREATE REL TABLE GROUP
is a better option.
A bit unrelated, but here is a question towards the usage of Cypher for DDL. The Kuzu DDL seems not to be not part of the OpenCypher9 specification. Is this a custom language addition to Cypher?
Yes, DDL statements are not part of the openCypher standard, instead customized in Kuzu. Here is some brief explanation behind the motivation if you're interested.
I see there is also the CONSTRAINT in the section Reserved for future use, so maybe a constraint definition concept outside of REL TABLE [GROUP] would make sense?
This is a good idea. Indeed we had discussions within the team to rework the multiplicity check as a part of the more generic (CHECK
) constraint in Cypher and SQL, but we were not confident if there are many users really care about this kind of constraints, which leads to lower priority on this feature.
I would certainly use constraints and multiplicity checks.
For me constraint can be split into 2 categories:
I work in an environment in which data is basically schemaless as I don't know the structure of it. A node just links to itself. Then on the node I have a MAP(STRING, STRING) to define properties for the nodes.
I could make very good use of a flexible constraint checker on fields and links on already imported data. If those are running on database level, it would be super fast. I would want to type check on (nested) node properties, check existence of fields and write a list of Cypher statements to check whether the "network checks" are met. They must return a list of primary node keys that violate the constraint.
In the RDF world, SHACL exists to do that and @prrao87 and Paco Nathan wrote a blog post about how to use it with Kuzu that I eagerly consumed.
Now I personally must say that RDF tooling is not good as it should be and with our amount of data, pyshacl
runs for multiple minutes which is bad user experience. It also does not run on database level and puts load to the user machine.
To me RDF feels academic and users first have to understand all the concepts and terms around it. Cypher on the other hand is an easy starter.
Hi @ubmarco thanks for providing more details on your use cases and thoughts. Yeah supporting more flexible constraint checks are very appealing to me, and constraints inside databases should be more performant and graceful compared to application level checks.
- Check after ingest. This is for quality control like "a child links to a parent, but the parent's age is smaller than the child's". These checks should be running on user's request or maybe automatic when data changes, if it's fast.
This is an interesting scenario, not sure how the syntax would be, or maybe it should be combined with UDF 🤔. But it actually looks like a quite fancy feature to me, which requires some flexible and efficient CHECK constraint support if it's done automatically on every data change.
Let me bring this up in the team, meanwhile we'll open an issue on this as a feature to gather more community feedback. We can start with basic ones, like non null, positive value, in/out degree checks, etc, though it's hard to promise a timeline for this, as the team is small and everyone is current busy with some important features right now. 😅
Kùzu version
v0.6.0
What operating system are you using?
Arch Linux x64
What happened?
I feel the relationship constraints are not working as documented in https://docs.kuzudb.com/cypher/data-definition/create-table/#relationship-multiplicities.
This Cypher executes successfully on an empty in-memory DB:
This is the result of the last
MATCH
statement in the Explorer:1 User Lives in 2 Cities while 1 City has 2 LivesIn Users. The point of stating
ONE_ONE
means that the relation must be 1 in both directions. Is this a bug or am I doing something wrong?Are there known steps to reproduce?
Start Explorer with an empty DB
Run the queries
Kuzu should complain about the relation
user2 to city1
anduser1 to city2
as it violates the constraint.