sdf-labs / tests

Standard testing library for SDF data tests
Apache License 2.0
7 stars 1 forks source link

Test Suite for sdf-tests #8

Open dbrtly opened 2 weeks ago

dbrtly commented 2 weeks ago

Summarize the feature in question.

Professionally test the sdf-tests jinja macros.

The sdf tool already excels at validating sql queries are correct, but I don't know how to wire it up.

What team(s) will benefit from this feature request?

Anybody who needs to test the jinja macros. Developer confidence in sdf-labs.

Is this an enhancement to an existing feature? If so, briefly describe the update.

Proposed tasks:

like this example:

https://github.com/mitsuhiko/minijinja/blob/670b67b967f273eab608ef1697e181dd5cfdc787/minijinja/tests/inputs/hello.txt

Describe alternatives you've considered if applicable.

Yee-haw!!

Additional context Add any other context or screenshots about the feature request here.

eliasdefaria commented 1 week ago

@wolfram-s I messsaged this to @dbrtly in the Slack, but wanted to echo and start the thread here:

You are spot-on, this is an underinvested part of our ecosystem. We have loads of internal tests which run when the SDF Rust binary is compiled, and these run workspaces with jinja materialization as well. However, we don't have unit tests on the jinja itself. Mostly because the long term vision for materialization is certainly not to use Jinja at all :) That doesn't mean in the meantime we shouldn't introduce a jinja unit testing framework. I love your pitch on minijinja and datafusion

wolfram-s commented 1 week ago

@dbrtly Realizing your ideas are overdue! Thnaks for bringing them to the forefront! We could maybe do this in order:

(0) Reorganizing macros/tests per file might make it easier to find/coordinate things. Was there any deeper reasoning to split them up like that?

(1) Leveraging Minjinja's (snapshot/golden file) test framework is a great idea. We could adopt the minjinja test runner to run all those macros. Given a snowflake macro (e.g. in a macros/snowflake directory).

{% macro format_date(date_column) %}
 TO_DATE({{ date_column }}, 'YYYY-MM-DD')
{% endmacro %}

A a simple unit test (adopting the minjinja test fwk) could be:

<context>
---
{{format_date(abc)}}
---
 TO_DATE(abc, 'YYYY-MM-DD')

(3) Running all macros for all dialects is an interesting extension. Suppose we organize all macros per dialect as suggested above.

We could then generate a (dialect indepdendent) SQL statement like so (below should work for most dialects, except maybe Bigquery, but you know better:-)

SELECT
    date_string,
    {{ format_date('date_string') }} AS formatted_date
FROM (
    SELECT * FROM VALUES
    ('2023-10-01'),
    ('2023-12-25'),
    ('2024-01-01')
) AS dates_table(date_string)
---
<intended output>

We can conpile/run tests like this using run the existing sqllogictest fwk (see e.g. https://github.com/risinglightdb/sqllogictest-rs ). In fact, internally we already use sqllogictest for testing our execution semantics across platforms. But we never made our adoption of the framework public. Maybe time to reconsider:-)

Does that reflect what you had in mind?

dbrtly commented 1 week ago

Yes this sounds like fun.

Maybe we could start by adding a cargo project to this repo and migrating the jinja to that. I can help with this part.

And then we might be ready on the decision on the sqllogictest logic. TIL about sqllogictest.

Most of these tests seem so fundamental that they could work on any provider. Would be nice to write the test once and use a tool to migrate the code to target dialect as needed. Would scale better to a large number of providers.