tortoise / tortoise-orm

Familiar asyncio ORM for python, built with relations in mind
https://tortoise.github.io
Apache License 2.0
4.65k stars 387 forks source link

CTE Support #819

Open lsabi opened 3 years ago

lsabi commented 3 years ago

Is your feature request related to a problem? Please describe. CTE is a feature that is integrated into other ORMs such as sqlalchemy. It allows to represent hierarchies, which are common for things like categories.

https://tortoise-orm.readthedocs.io/en/latest/examples/basic.html?highlight=recursive#recursive-relations is not exactly what I'm looking for, as it doesn't seem to be usable with QuerySet.

Describe the solution you'd like Some simple way of using CTEs. It could be by directly specifying an option within the Meta of the model.

Describe alternatives you've considered Direct query, which goes against the idea of using an ORM

Additional context

EDIT

Sqlalchemy CTE: https://docs.sqlalchemy.org/en/14/core/selectable.html#sqlalchemy.sql.expression.cte

and some examples:

https://gist.github.com/cairabbit/d64fccf7cf2abe180e69c843706f46c7 https://stackoverflow.com/questions/34847869/sqlalchemy-using-a-cte-from-a-subquery-w-from-clause-specified-as-literal-te

long2ice commented 3 years ago

Could you give a link for that?

Masynchin commented 3 years ago

@long2ice any ideas? It would be great if CTE can be used with Tortoise (for new project I would prefer SQLA over TORM only for that reason)

long2ice commented 3 years ago

Is there something same in django? Or the raw sql like?

Masynchin commented 3 years ago

Is there something same in django? Or the raw sql like?

Found this extension, was suggested by author to Django Core, but not any updates since

lsabi commented 3 years ago

@Masynchin To be honest I don't like that much the syntax of the library you linked. I believe it would be cleaner and more readable some like

object.cte(operator, query1, query2).

For instance, taking the example from https://www.postgresql.org/docs/13/queries-with.html

WITH RECURSIVE search_graph(id, link, data, depth) AS (
    SELECT g.id, g.link, g.data, 1
    FROM graph g
  UNION ALL
    SELECT g.id, g.link, g.data, sg.depth + 1
    FROM graph g, search_graph sg
    WHERE g.id = sg.link
)
SELECT * FROM search_graph;

we can see that the inner part of the WITH clause is made by two queries. Thus, re-using the code to generate these two, adding the UNION condition and then selecting every column from the new table or join it with another table, using the usual tortoise syntax, because we return a QuerySet.

This is my proposal, if I've understood correctly how the library is structured and if I'm correctly following the chosen patterns. @long2ice did I get it correctly? Does it make sense?

Masynchin commented 3 years ago

To be honest I don't like that much the syntax of the library you linked.

I think so too. I will write some raw SQL to solve my own problems (and will think how it can be in TORM) and then I can suggest my version too

Masynchin commented 3 years ago

I will write some raw SQL to solve my own problems

Here is query:

WITH RECURSIVE cte(id, parent_id, content, timestamp, path, level) AS (
    SELECT
        comment.id,
        comment.parent_id,
        comment.content,
        comment.timestamp,
        comment.id AS path,
        1 AS level
    FROM comment
    WHERE comment.parent_id IS NULL

    UNION

    SELECT
        comment.id,
        comment.parent_id,
        comment.content,
        comment.timestamp,
        cte.path || '/' || comment.id AS path,
        cte.level + 1
    FROM comment, cte
    WHERE comment.parent_id = cte.id
)

SELECT *
FROM cte
ORDER BY path;

Comment model:

class Comment(Model):
    id = fields.IntField(pk=True)
    content = fields.TextField()
    timestamp = fields.DatetimeField(auto_now_add=True)

    parent_id = fields.ForeignKeyField("models.Comment", null=True)

I tried to translate this query to TORM, here is my attempt:

base = (
    Comment.filter(parent_id__isnull=True)
    .annotate(path=F("id")
    .annotate(level=1)
)

child = (
    Comment.annotate(path=F("cte__path")+"/"+F("id"))
    .annotate(level=F("cte__level")+1)
)

cte_query = base.as_cte(child, recursive=True, union=True).order_by("path")
# "UNION ALL" can be set by union_all=True

I see sush problems:

lsabi commented 3 years ago

@Masynchin

how to extend returned arguments (like path or level)? My guess is that annotate is the appropriate way. Unless there's a newer way of achieving the same result.

how to point out that FROM must be from two tables? What do you mean? The idea is to create the code of the two queries and in between the UNION operator. This will become a single table.

The approach you sketched out is the exact idea of what I had in mind.

Masynchin commented 3 years ago

how to point out that FROM must be from two tables? What do you mean?

   SELECT
        comment.id,
        comment.parent_id,
        comment.content,
        comment.timestamp,
        cte.path || '/' || comment.id AS path,
        cte.level + 1
-> FROM comment, cte
   WHERE comment.parent_id = cte.id
lsabi commented 3 years ago

That's a good question. Generating a tree becomes a mess.

I can't find any sqlalchemy's cte query to use as "inspiration"...

Masynchin commented 3 years ago

I can't find any sqlalchemy's cte query to use as "inspiration"...

How about this example?

lsabi commented 3 years ago

It doesn't seem to traverse a structure like a tree or hierarchy (that's my idea of CTE). It selects from a table (using CTE) and then it joins another table. For what I can see, it's missing the part of the hierarchy

cte.path || '/' || comment.id AS path,
cte.level + 1

Correct me if I'm wrong.

Upd. Nevertheless, we can take inspiration from it (as a starting point)

Masynchin commented 3 years ago

It doesn't seem to traverse a structure like a tree or hierarchy (that's my idea of CTE). It selects from a table (using CTE) and then it joins another table. For what I can see, it's missing the part of the hierarchy

Correct me if I'm wrong.

Without this two lines, all comments from parent to deepest childs are selected, but not in order that I want. With this trick I only do tree-like order.

With order by path:

.. Top comment without replies
.. Top comment
.... Child of top
...... Grand child 2-first
...... Grand child 1-second
.... Child without replies
.... Child with end comment
...... End comment!

Without:

.. Top comment without replies
.. Top comment
.... Child of top
.... Child with end comment
.... Child without replies
...... Grand child 1-second
...... Grand child 2-first
...... End comment!

Each two dots represents 1 depth level

lsabi commented 3 years ago

Yes, but I meant that the linked example does not contain a recursive query. The main reason I would use a CTE is because of a recursive query that can perform a tree traversal.

I'm still looking for some ideas/inspirations on how to get it to work, but I have nothing at the moment...

Masynchin commented 3 years ago

What about this?:

head = (
    Comment.filter(parent_id__isnull=True)
    .annotate(path=F("id"))
    .annotate(level=RawSQL("1"))
)
children = (
    Comment.filter(parent_id=CteField("id"))
    .annotate(path=CteField("path")+"/"+F("id"))
    .annotate(level=CteField("level")+1)
)
cte = Cte(Comment, recursive=True).union(head, children)

comments = await cte.order_by("path")

CteField can be something like F (maybe it must inherit from F)

Masynchin commented 3 years ago

If I understand correctly, there is no way to make CTE builder in TORM without contribute to Pypika. There is open issue about it https://github.com/kayak/pypika/issues/624

Masynchin commented 3 years ago

I also replaced FROM comments, cte with FROM comment JOIN cte ON comment.parent_id = cte.id so we can use tables-joiner. We can fetch information about JOIN condition from Comment.filter(parent_id=CteField("id"))

lsabi commented 3 years ago

Does something like this work?

head = (
    Comment.filter(parent_id__isnull=True)
    .annotate(path=F("id"))
    .annotate(level=RawSQL("1"))
)
children = (
    Comment.filter(parent_id=head.id)
    .annotate(path=CteField("path")+"/"+F("id"))
    .annotate(level=CteField("level")+1)
)
cte = Cte(Comment, recursive=True).union(head, children)

comments = await cte.order_by("path")

Note the Comment.filter(parent_id=head.id) instead of Comment.filter(parent_id=CteField("id"))

Masynchin commented 3 years ago

Does something like this work?

Since head is type of QuerySet - no. Because of that I thinked about CteField, which can be used like F, and fetched (from _annotations) by Cte when building query

lsabi commented 3 years ago

But this would require to define in the source code a CteField operator...

@long2ice do you think this is feasible and it is not too complicated to implement? (I would try to do it)

Masynchin commented 3 years ago

@lsabi I just reread examples from your sources and from django extension - at now we have something similar to them (in term of high-level interface) but shorter and more explicitly, and I think it is a good sign. I also noticed that they both passing CTE object to provide field to select/join to in child query. I suggested before this way:

... fetched (from _annotations) by Cte when building query

but it can also be like:

cte = Cte(...)
child = ...annotate(some_field=CteField(cte, "field"))

There is some of my ideas, hope this will be helpful for our goal

lsabi commented 3 years ago

As soon as I have some spare time, I'll try to implement it