Closed begelundmuller closed 1 month ago
The idea is that each file could define a set of project files and OLAP connectors to target, and then a list of tests to run and expected results. We might then have many of these files, such as resolvers/testdata/metrics_unnest.yaml
, resolvers/testdata/metrics_measure_filters.yaml
, etc.
Some of the goals here are:
testruntime
(which has the problem that making a tweak for one test causes a lot of unrelated tests other places to fail)testcontainers
, Druid will run only if a test cluster is defined)I'm sure this can be made much cleaner, but here's a draft YAML spec for a test file.
files:
model.sql: SELECT ...
metrics_view.yaml:
type: metrics_view
dimensions:
- ...
measures:
- ...
connectors:
- duckdb
- clickhouse
tests:
test_name:
resolver: metrics
properties:
query:
metrics_view: mv
...
result:
rows:
- ...
csv: >
xxx,xxx,xxx
other_test_name:
variables:
rill.metrics.approximate_comparisons: true
claims:
attributes:
- ...
rules:
- ...
resolver: metrics_sql
properties:
sql: SELECT ...
result:
rows:
files:
model.sql: SELECT ...
metrics_view.yaml:
type: metrics_view
dimensions:
- ...
measures:
- ...
connectors:
- duckdb
- clickhouse
tests:
test_name:
resolver: metrics
properties:
query:
metrics_view: mv
...
result:
rows:
- ...
csv: >
xxx,xxx,xxx
other_test_name:
variables:
rill.metrics.approximate_comparisons: true
claims:
attributes:
- ...
rules:
- ...
resolver: metrics_sql
properties:
sql: SELECT ...
result:
rows:
aside from the structure, it's not that easy compared with the something as simple as proposed in the example:
require skip_reload
statement ok
ATTACH ':memory:' AS db1
statement ok
CREATE TABLE db1.integers(i INTEGER);
statement ok
INSERT INTO db1.integers VALUES (1), (2), (3), (NULL);
# now export the db
statement ok
EXPORT DATABASE db1 TO '__TEST_DIR__/export_test' (FORMAT CSV)
writing SQLs with yaml identation will be akward.
Being able to optionally test the export output instead of the query output
It also should be easy to run with the debugger (it's a frequent operation).
I should remind that this is so neat-looking because it's only SQL tests:
require skip_reload
statement ok
ATTACH ':memory:' AS db1
statement ok
CREATE TABLE db1.integers(i INTEGER);
statement ok
INSERT INTO db1.integers VALUES (1), (2), (3), (NULL);
# now export the db
statement ok
EXPORT DATABASE db1 TO '__TEST_DIR__/export_test' (FORMAT CSV)
and once more parameter added it will look as complicated as a Go unit-test.
DuckDB needs to test a lot of SQL syntax and semantics and that's why those .test
files were created - but if you need to test something more structured that requireds JSON or YAML configuration - that quickly goes sideways - and will look as a rendandnt work compared to using only Go unit-tests.
So the decision should be to restrict text-fixtures to as small in structure as possible without plan to extend it - with knowing that simple structure along will cover a lot of test cases.
I should repeat that is not simple and I sense an expectation it should be extended:
I propose anything that can be parsed line by line, possible permissions can be added as an additional line:
statement ok
read-perm write-perm
CREATE TABLE db1.integers(i INTEGER);
statement ok
read-perm
INSERT INTO db1.integers VALUES (1), (2), (3), (NULL);
Other parameters should be excluded and any separation resolver/project seperation should be done by folder and filenames or ignored and done in specialized native Go-tests.
We are not looking to test SQL here, we are looking to test our resolvers, which a) rely on project files being parsed, b) need to be tested against multiple OLAPs (DuckDB, ClickHouse, Druid), c) have nested query parameters.
I think the complexity here may make a flat text file tricky, but if you think it's possible, then please share a concrete proposal. If you think Go code is better, please suggest a syntax that incorporates all the considerations mentioned in the messages above.
So we have a directory structure right now like this:
testruntime
adbids_druid (druid)
adbids (duckdb)
the problem here is that each OLAP has it's own directory and configuration - means if we create a resolver-yaml inside one of them then we need to create a copy of that for another OLAP, like:
testruntime
adbids_druid
apis
check_sql1.yaml
check_sql2.yaml
adbids
apis
check_sql1.yaml
check_sql2.yaml
To reuse our current framework we need to create for each OLAP identical test projects that differ only in OLAP configuration.
Option 2. Generate or copy apis
yaml files when a test is run (as apis
yaml doesn't reference an OLAP type).
So in pseudo language there will be a resolvers_test.go with something like:
func TestResolvers(t *testing.T) {
for dir := dirs("testruntime/testdata/*_resolver") { // add suffix 'resolver' to projects to indicate it's particular created for resolvers test
fmt.Println("running fixtures for engine"+dir")
runAllFixtures(t. dir)
}
}
func runAllFixtures(t *testing.T, dir) {
cleanAPIFolder(dir)
for fixture := files("testruntime/testdata/resolver_fixtures/*.test") {
createAPI(fixture, dir) // reading a fixture and generating an file in `apis` for the given instance project
}
instance := createInstance(dir)
runAPITests(t, instance)
}
Where the fixture for a SQL resolver should have a name sql_\<name>.test and the content:
statement ok
read-perm write-perm
param1 bid param2 t1
SELECT param1 from param2;
—————
bid
1
2
or yaml syntax:
statement: ok
permissions:
- read-perm
- write-perm
params:
param1: bid
param2: t1
sql: SELECT param1 from param2;
output: |\n
bid\n
1\n
2\n
Considering we need to generate apis
for arbitrary resolvers it's clear now Yaml will be easier to use.
Where runAPITests will do something like:
for fn := files("testruntime/testdata/resolver_fixtures/*.test") {
t.Run("engine"+instance+"fixture"+fn, func(t *testing.T) {
f:=readFixture(fn)
api:= instance.APIForName(f.name)
result := api.Resolve(convertToResolveArguments(f))
require.Equal(t, output(f), toString(result))
}
}
If we need to debug a particular fixture we can write an ad-hoc testing function like:
func TestDebugFixture1(t *testing.T) {
cleanAPIFolder(enginedir)
createAPI(fixture_path, enginedir)
instance := createInstance(enginedir)
f:=readFixture(fixture_path)
api:= instance.APIForName(f.name)
result := api.Resolve(convertToResolveArguments(f))
require.Equal(t, output(f), toString(result))
}
Please take a deeper look at the two first comments on this issue. Notably, the goal is to:
testruntime
, but instead provide the project files directly in the test file for a group of tests. There are multiple reasons for this:
metrics:
resolver that takes a Query
(link). Unlike APIs and metrics SQL, this object can be quite nested.So it will look like we merge testruntime/testdata/adbids/**/*.yaml
into a single test.yaml
(but it contains additionally tests and connectors too), it's one level more complex because we need to parse the whole project differently and write another initialization flow for an instance, but I get your idea.
Considering this:
files:
model.sql: SELECT ...
metrics_view.yaml:
type: metrics_view
dimensions:
- ...
measures:
- ...
connectors:
- duckdb
- clickhouse
tests:
test_name:
resolver: metrics
properties:
query:
metrics_view: mv
...
result:
rows:
- ...
csv: >
xxx,xxx,xxx
other_test_name:
variables:
rill.metrics.approximate_comparisons: true
claims:
attributes:
- ...
rules:
- ...
resolver: metrics_sql
properties:
sql: SELECT ...
result:
rows:
For each connector we need to create an instance and give it (instead of repo
connector) the test-yaml
connector that will read the project from the yaml.
inst := &drivers.Instance{
Environment: "test",
OLAPConnector: "duckdb",
RepoConnector: "repo",
CatalogConnector: "catalog",
Connectors: []*runtimev1.Connector{
{
Type: "file",
Name: "repo",
Config: map[string]string{"dsn": projectPath},
},
{
Type: "duckdb",
Name: "duckdb",
Config: map[string]string{"dsn": ":memory:"},
},
{
Type: "sqlite",
Name: "catalog",
// Setting a test-specific name ensures a unique connection when "cache=shared" is enabled.
// "cache=shared" is needed to prevent threading problems.
Config: map[string]string{"dsn": fmt.Sprintf("file:%s?mode=memory&cache=shared", t.Name())},
},
},
EmbedCatalog: true,
}
TestResolvers_test.go
transforms in something like:
for suite := files("testdata/*.suite") {
for c := readConnectors(suite) {
i := createInstance(createRepoConnector(suite)) // creating 'test-yaml' connector that provides the project content from `files` field
for test := readTests(suite) {
t.Run(test.name, func() {
runTest(i, test)
})
}
}
}
Frankly, some points still trouble me.
Makes it easy to develop tests because the project files and test definition are next to each other
Considering this, I wonder why we designed a project configuration with separate files initially, maybe we should redesign the project configuration to a single file? Right now we are planning to create different project configuration format for tests/production - that can lead to additional complexity.
Makes it easy to add new test cases without breaking project files relied upon by tests in other files
This is relates to previous. But I could add additionally, the problem with suddenly-broken tests is solved by creating another project in testdata
. You will encounter exactly the same problem with a single yaml file once it will grow larger, you'll have a desire to tweak a connector slightly for a new test and suddenly some other tests in the file are broken.
Besides copy-pasting a file is as easy as copying a folder.
Makes it easy to develop tests because the project files and test definition are next to each other
Yes, but why was the separation of projects from tests and placing them in testdata
initially? We can place them together in the same folder.
For each connector we need to create an instance and give it (instead of repo connector) the test-yaml connector that will read the project from the yaml.
Not necessarily – you can just use t.TempDir
to create a temporary directory, and write the test files into it. That's what the parser tests do: https://github.com/rilldata/rill/blob/96bef3505aa43d040840ed57dfaea1a5c868beec/runtime/compilers/rillv1/parser_test.go#L1903
I wonder why we designed a project configuration with separate files initially, maybe we should redesign the project configuration to a single file?
That would be nice and has been requested previously. A single file would be especially nice for small projects (like test projects!). But it's out of scope for this work.
Right now we are planning to create different project configuration format for tests/production
By taking the same approach as the parser tests, where files are declared inline but written out to a temp directory, this will not be a problem (because the actual parser will still be parsing multiple files on disk).
You will encounter exactly the same problem with a single yaml file once it will grow larger, you'll have a desire to tweak a connector slightly for a new test and suddenly some other tests in the file are broken. Besides copy-pasting a file is as easy as copying a folder.
So the goal here is exactly to be able to just copy-paste one file and change it. It means you can add a new test by adding a single file. Copy/pasting a folder in a separate location from the test means future maintainers need to open and diff many files to understand the difference. That's what would be nice to avoid.
Yes, but why was the separation of projects from tests and placing them in testdata initially? We can place them together in the same folder.
Yeah the goal here is to correct that mistake by moving test data and tests closer to each other.
FYI, the datasource will be a separate file anyway.
An example of the project in a single yaml:
project:
sources:
ad_bids_mini_source.yaml:
connector: local_file
path: data/AdBids.csv.gz
models:
ad_bids_mini.sql: |
select
id,
timestamp,
publisher,
domain,
volume,
impressions,
clicks
from ad_bids_mini_source
dashboards:
ad_bids_mini_metrics_with_policy.yaml:
model: ad_bids_mini
display_name: Ad bids
description:
timeseries: timestamp
smallest_time_grain: ""
dimensions:
- label: Publisher
name: publisher
expression: upper(publisher)
description: ""
- label: Domain
property: domain
description: ""
measures:
- label: "Number of bids"
name: bid's number
expression: count(*)
- label: "Total volume"
name: total volume
expression: sum(volume)
- label: "Total impressions"
name: total impressions
expression: sum(impressions)
- label: "Total clicks"
name: total click"s
expression: sum(clicks)
security:
access: true
row_filter: "domain = '{{ .user.domain }}'"
exclude:
- if: "'{{ .user.domain }}' != 'msn.com'"
names:
- total volume
apis:
mv_sql_policy_api.yaml:
kind : api
metrics_sql: |
select
publisher,
domain,
"total impressions",
"total volume"
FROM
ad_bids_mini_metrics_with_policy
connectors:
- duckdb
- clickhouse
tests:
test_name:
resolver: metrics
properties:
query:
metrics_view: mv
...
result:
rows:
- ...
csv: >
xxx,xxx,xxx
other_test_name:
variables:
rill.metrics.approximate_comparisons: true
claims:
attributes:
- ...
rules:
- ...
resolver: metrics_sql
properties:
sql: SELECT ...
result:
rows:
An example of the project in a single yaml:
Looks pretty good to me.
project: sources: ad_bids_mini_source.yaml:
Could this be flattened to something like this?
project:
sources/ad_bids_mini_source.yaml:
The idea here is to setup a test file abstractions for
runtime/resolvers
, where test queries for any resolver, security claims, project files, and backend can be written in a nice syntax.The idea here is to draw inspiration from
sqllogictest
(also see this example from DuckDB's test files). For example, this could be YAML test files inruntime/resolvers/testdata
, which are loaded, parsed and executed.It would also be nice to support a way to automatically update the expected test output, e.g. using an
--update
flag as described in this blog post.Lastly, we should start migrating our existing metrics tests to the new files.