quarylabs / quary

Open-source BI for engineers
https://www.quary.dev
Apache License 2.0
2.13k stars 48 forks source link

Manage Indexes #400

Closed xrl closed 1 month ago

xrl commented 1 month ago

Please Describe The Problem To Be Solved

We have large tables and materialized views and want to create indexes so we can efficient JOIN. I know DBT offers a proprietary {{ ... }} syntax at the top of the models for managing indexes, what is the solution for Quary?

Optional: Suggest A Solution

Quary could probably just support the same syntax as DBT?

benfdking commented 1 month ago

Hi @xrl appreciate the feedback, we'll have a look into it!

xrl commented 1 month ago

I think indexes are important for performance. What potential workarounds do you have for large table JOINs? Should I be managing indexes outside of Quary?

benfdking commented 1 month ago

Are these indexes on materialised/tables views you're creating?

xrl commented 1 month ago

I believe you can only put indexes on tables and materialized views, but yes. I want to add indexes to objects managed by our Quary model sync

benfdking commented 1 month ago

Hey, we'll probably implement this but unlike dbt, we would put it in the yaml file. In the meantime, if you do have performance issues, you could apply indexes manually by running a quick, let me get back to you on a timeline of us having this done.

psql -f file.sql

after a quary build step where file.sql looks something like

CREATE INDEX IF NOT EXISTS index_name
ON table_name (column1, column2, ...);
...
benfdking commented 1 month ago

https://github.com/quarylabs/quary/pull/433 This is a first draft I did, what do you think?

benfdking commented 1 month ago

This has been fully built and deployed. Here is an example of it being used. Let me know if you get the chance to try it. I will document it fully tomorrow.

sources:
  - name: raw_orders
    path: jaffle_shop.orders
models:
  - name: stg_orders
    materialization: table
    database_config: 
      unlogged: true
      indexes: 
        - columns: 
            - order_id
            - status
        - unique: true
          columns: 
            - order_id

    tests:
      - type: multi_column_unique
        info:
          columns: order_id,status
    columns:
      - name: order_id
        tests:
          - type: unique
          - type: not_null
          - type: gte
            info:
              column: order_id
              value: 1

Here it is also documented https://www.quary.dev/docs/configuration

benfdking commented 1 month ago

Hey, I am going to close this for the time being. Please feel free to reopen it if it doesn't fit your needs.