snowplow / sql-runner

Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
http://snowplowanalytics.com
Apache License 2.0
79 stars 14 forks source link

Allow steps to loop through array of variables #100

Open LauLaman opened 7 years ago

LauLaman commented 7 years ago

At @Werkspot we are active in multiple countries. Each country has exactly the same database structure. We load each country in a separate schema in Redshift

I would like to create a playbook where i can run steps in a loop with different variables.

I was thinking about something like this: (i know i can put multiple Queries in one file but want to make clear what i was thinking about)

loop:
  variables:
    - country: nl
      schema: raw_data_nl
    - country: it
      schema: raw_data_it
steps:
# remove old tmp table (if it exists) so we are sure we are starting fresh
  - name: Remove tmp table
    loop: false
    queries:
      - name: remove tmp table
        file: 01.remove_tmp.sql
  - name: create tmp table
    loop: false
    queries:
      - name: create table
        file: 02.01.first_subscription.sql
        loop: false
  - name: Manipulate data per country
    loop: true # since loop contains variables there is no need to also put the `template` option here, the loop should overwrite this to true
    queries:
      - name: Create table A
        file: 03.01.create_table_a.sql
      - name: Create table B
        file: 03.02.create_table_b.sql
  - name: combine table A & B per country
    loop: true
    queries:
      - name: combine table A & B
        file: 04.combine_tables_and_insert_into_tmp.sql
  - name: Replace old data with new data
    loop: false
    queries:
      - name: Delete old table
        file: 05.01.delete_old_table.sql
      - name: Rename tmp
        file: 05.02.rename_tmp.sql
alexanderdean commented 6 years ago

I really like this idea. I think we just have to figure out the syntax a bit.

dilyand commented 5 years ago

@LauLaman That might already be possible with some more advanced use of the Go text template processor.

You would define the range of values in your playbook:

:variables:
  :countries: ["it", "nl"]

and then the .sql file can 'loop' through them:

{{ range .countries }}
CREATE TABLE raw_data_{{ . }}.new_table AS (
  SELECT col1, col2, col3
  FROM table
);
{{ end }}