ponyorm / pony

Pony Object Relational Mapper
Apache License 2.0
3.66k stars 245 forks source link

Query not getting automatic DISTINCT #427

Open fluffy-critter opened 5 years ago

fluffy-critter commented 5 years ago

Short version

When I'm doing a complex JOIN-style query that spans multiple tables, the SELECT DISTINCT is not being applied when I think it should be.

Long version

Hi,

I'm trying to add a tagging system to my blog engine. I build my queries using a cascade of queries where I apply multiple filters using a pattern like (pseudocodeish, you can see the actual code at https://github.com/PlaidWeb/Publ/blob/master/publ/queries.py):

query = model.Entry().select()
if criterion_a:
    query = select(e for e in query where key_a == criterion_a)
if criterion_b:
    query = select( e for e in query where key_b == criterion_b)

and so on.

I'd like to be able to filter by more than one tag at a time, and so the actual filter I added was:

query = orm.select(e for e in query for t in e.tags if t.tag in tag_list)

(where tag_list is, obviously, a list of tags). The relevant parts of my schema are:

class Entry(db.Entity):
    tags = set("EntryTag")

class EntryTag(db.Entity):
    tag = Required(str)
    entry = Required(Entry)

Anyway, the queries only ever involve selecting entries out, but when I try selecting from two tags I get duplicates for entries which have more than one tag in the selection list. Additionally, the generated query is:

SELECT "e"."id", "e"."file_path", "e"."category", "e"."status", "e"."utc_date", "e"."local_date", "e"."display_date", "e"."slug_text", "e"."entry_type", "e"."redirect_url", "e"."title", "e"."sort_title", "e"."entry_template"
FROM "Entry" "e", "EntryTag" "t-1"
WHERE ("e"."status" = ? OR "e"."status" = ? AND "e"."utc_date" <= ?)
  AND "e"."category" = ?
  AND "t-1"."tag" IN (?, ?)
  AND "e"."id" = "t-1"."entry"

So, the query is doing a select from both Entry and EntryTag (rather than using a JOIN as I'd expect), and is not doing a DISTINCT as it should be.

I will investigate using an explicit left_join query type, but in the meantime the documentation implies that this should work and get the automatic DISTINCT query, since each of the individual queries is only selecting entries out.

So, questions I have here:

  1. Is it possible to force PonyORM to do a SELECT DISTINCT even if the query generator doesn't think it should?
  2. Is there a different pattern I should be using for this join?
fluffy-critter commented 5 years ago

Okay, I found that if I added a .distinct() after my .order_by() that fixes it on my end. But .distinct() doesn't appear as a method of a generated query in the docs; the closest I'm finding is https://docs.ponyorm.org/api_reference.html#distinct which makes it seem like it's only callable at the ORM package level.

It would also be helpful if https://docs.ponyorm.org/queries.html#automatic-distinct had a note e.g.

If for some reason the query isn't getting DISTINCT when you think it should be, add a .distinct() to it like

query = select(p for p in Person for c in p.cars if c.make in ("Toyota", "Honda")).distinct()