sdf-labs / sdf-cli

This is the main repository for SDF documentation found at docs.sdf.com, as well as public schemas, benchmarks, and examples
63 stars 6 forks source link

dbt integration/bootstram missing target #21

Open geoHeil opened 6 hours ago

geoHeil commented 6 hours ago

Describe the bug

Trying to explore from/with dbt:

sdf dbt init
   Finished dbt in 0.329 secs

error: SDF1013: YAML error: missing field `target`

Fails. However, I have a profiles.yml file sitting in my current directory looking like:

flags:
  send_anonymous_usage_stats: False

my_project:
  target: dev
  outputs:
    dev:
      type: duckdb
      schema: "{{ env_var('WAREHOUSE_SCHEMA', 'bar_dev') }}"
      path: ./analytics_database_dev.duckdb
      threads: 2
    prod:
      type: duckdb
      schema: "{{ env_var('WAREHOUSE_SCHEMA', 'bar') }}"
      path: /path/to/prod/analytics_database_prod.duckdb
      threads: 4

To Reproduce Steps to reproduce the behavior:

  1. create a dummy dbt project with duckdb
  2. have it set up as outlined above
  3. run dbt compile
  4. run sdf dbt init
  5. See error

Expected behavior sdf init from dbt project with duckdb to work

Additional context Add any other context about the problem here.

eliasdefaria commented 5 hours ago

Hey @geoHeil! Thanks for giving this a try. This likely has to do with the flags yml. And our JSON schema for this YML not being compliant with that

flags:
  send_anonymous_usage_stats: False

In order to unblock yourself, you can try removing the flags YML. We'll make sure to fix this in a future release.

Also worth mentioning, we don't support DuckDB today, so the best way to get started with SDF on top of a dbt project is by using the jaffle-shop-classic example project.

We have a guide for setting this project up here: https://docs.sdf.com/integrations/dbt/integrating

geoHeil commented 1 hour ago

According to https://github.com/search?q=repo%3Asdf-labs%2Fsdf-cli%20duckdb&type=code you emulate duckdb - but in the comment you mentioned you do not support it - what does this mean?

Indeed:

error: SDF1013: JSON error: unknown variant `duckdb`, expected one of `redshift`, `snowflake`, `postgres`, `bigquery`, `trino`
eliasdefaria commented 1 hour ago

@geoHeil The SDF binary contains Apache DataFusion baked in. We resolved dialects like Snowflake, Trino, and BigQuery down to the DataFusion logical plan for execution and analysis. DataFusion is also an in-memory analytical database even faster than DuckDB on certain benchmarks.

In this way, we combine the transformation layer with an execution engine (kind of like dbt + DuckDB), however we don't support the DuckDB dialect of SQL natively.

That being said, the jaffle shop duckdb project contains some relatively simple SQL, that we actually can execute natively. Hence the migration guide takes you through compiling and running that project locally with SDF, using our default trino dialect of SQL (the duckdb jaffle shop SQL does not need to be modified in order to run as the trino dialect).

I realize this is all a bit convoluted. I'll be sure to update the docs to make this distinction more explicit. Thanks for your feedback!