kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.42k stars 98 forks source link

[Feature]: Unify REL TABLE GROUP with REL TABLE #3627

Open andyfengHKU opened 5 months ago

andyfengHKU commented 5 months ago

Description

Back in the day, we added REL TABLE GROUP because our REL TABLE cannot have multiple FROM,TO pairs. While in practice, users do want to have one (conceptual) REL TABLE that connects multiple NODE TABLEs.

The expected usage of REL TABLE GROUP is to be a drop-in replacement for REL TABLE except for COPY (see the note below). We have issues like #3608 and #2929 asking about different behaviours between REL TABLE GROUP and REL TABLE. We can (and should) implement them. As I plan to get started on them, I think we might just do what #2914 suggested, where we do not keep a difference between REL TABLE GROUP and REL TABLE on the user's end. Instead, we keep REL TABLE GROUP as an internal concept that is generated automatically we see multiple FROM, TO pairs when creating a REL TABLE.

[!NOTE] The problem with COPY is that we need to ask user to provide table name for each row, otherwise it's impossible to resolve the table deterministically.

semihsalihoglu-uw commented 5 months ago

As I noted before, I don't think there is a very good solution here until we have a solution to the COPY FROM and CREATE statements into rel table groups. I do see the advantage of making Rel Table Group and REL TABLE uniform. However this is not going to be easy. There are edge cases we probably don't want to think about now. For example, what happens when someone starts with a Rel Table with multiple from and to's, so they are actually constructing a rel group but then through alter schema's the number of from and to's reduces to 1. Do we automatically turn that rel group to a rel table internally? What do we show when people write call show_tables() return *? Do we return a mix of rel tables and rel table groups? Overall, as long as we don't have a uniform notion internally at the system, we I think we will not be able to offer a consistent experience to users. There will be edge cases and inconsistencies that confuse people. So I think it is better to use some usability but be very clear about two notions: a Rel Table and a Rel Table Group, which is a syntactic sugar around multiple rel tables.

That said, I think we should support a few things to improve the experience:

  1. ALTER REL TABLE GROUP: which should allow people to add and remove FROM and TO pairs and add and drop columns. The syntax for adding and dropping columns should mimic: ALTER TABLE X ADD/DROP C. So this would be ALTER REL TABLE GROUP RelGroup_Foo Add/Drop C. We can also support ALTER TABLE RelGroup_C ADD/Drop C as a syntactic sugar and give a warning.

"RelGroup_Foo is a rel table group, so is a syntactic sugar to refer to multiple possible rel tables. So you have altered the following tables: RelGroup_Foo_X_Y, RelGroup_Foo_U_W."

We need a new syntax for adding and dropping new FROM/TO pairs. We could do the following but suggest new syntax if you have other ideas:

ALTER REL TABLE GROUP X ADD/DROP FROM A TO B

  1. Change Drop Table RelGroup_Foo to Drop Rel Table Group RelGroup_Foo: This will make the notion of Rel Group more explicit. It is also confusing call a rel group a table. But I would like to suggest that we support DROP TABLE RelGroup_Foo as well but give a warning: "RelGroup_Foo is a rel table group, so is a syntactic sugar to refer to multiple possible rel tables. So you have dropped the following tables: RelGroup_Foo_X_Y, RelGroup_Foo_U_W."

  2. Wrong position of GROUP in ALTER/CREATE/DROP REL TABLE GROUP: This mistake is reported here: For example, users might write CREATE REL GROUP TABLE instead of CREATE REL TABLE GROUP. We should suggest the correct position of this with a "Did you mean CREATE REL TABLE GROUP?"

  3. Support COPY and CREATE when we can infer the types of the FROM and TO. So for example, consider the following query:

COPY RelGroup_Foo FROM (MATCH (a:Person)-[e:Likes]->(b:Person) RETURN a, b, e.since) Her we know that the first two variables a and b have Person so we can rewrite this as

COPY RelGroup_Foo_Person_Person FROM (MATCH (a:Person)-[e:Likes]->(b:Person) RETURN a, b, e.since) and get this to work.

We might be able to do something similar for supporting CREATE statements. For example:

CREATE (c:Person {})-[e:RelGroup_Foo]->(b:Person: {...})

This can be supported as this is equivalent to:

CREATE (c:Person {})-[e:RelGroup_Foo_Person_Person]->(b:Person: {...})

In cases where the source and destination nodes can be of multiple node tables, we can give a warning saying:

RelGroup_Foo is a rel table group, so is a syntactic sugar to refer to multiple possible rel tables. COPY FROM and CREATE statements where the source and destination nodes can have more than one label are not currently supported.

I think this will at least improve usability. But don't try to find a great solution here until we have a solution to

andyfengHKU commented 5 months ago

what happens when someone starts with a Rel Table with multiple from and to's, so they are actually constructing a rel group but then through alter schema's the number of from and to's reduces to 1

This won't be a problem. Everything should just work as we would expect. There is nothing prevent a REL TABLE GROUP to have one REL TABLE.

What do we show when people write call show_tables() return *? Do we return a mix of rel tables and rel table groups

We should stay the same as the current way of printing. I was thinking only get rid of REL TABLE GROUP from grammar but we still need to explain this notion in the documentation since we cannot copy into REL TABLE GROUP.

Her we know that the first two variables a and b have Person so we can rewrite this as

I don't think it's safe to make the assumption that user if trying to copy to RelGroup_Foo_Person_Person. Especially when this guess is based on table name.

Overall I agree with the most parts and we will keep REL TABLE GROUP. Though I still plan to turn on the multiple FROM TO pairs when creating REL TABLE. Think of it as a rewrite from

CREATE REL TABLE X (FROM A TO B, FROM B TO C)

to

CREATE REL TABLE GROUP X (FROM A TO B, FROM B TO C)
andyfengHKU commented 1 month ago