Closed IndexSeek closed 1 week ago
A few thoughts/questions I have on this:
Should we wait to move this logic to the SQLGlotCompiler?
Should we break apart the first sample test? It made it past the first assertion but failed on df = t.sample(0.1, method="block").execute()
due to this error: SAMPLE clause on views only supports row wise sampling without seed. I marked that to be skipped in the interim, but thinking we would want to cover the first scenario at least or test it without a view.
CC: @cromano8 as we were discussing this behavior.
Definitely going to check this out tomorrow/wednesday
I am marking this as a draft for the time being; I haven't been able to dig back in yet.
I'm currently working on setting things up to properly test locally against this backend rather than relying on the cloud-ci-runs. This should make future PRs a little smoother for me as well.
@IndexSeek I can help here, there's only two kinds of failures and I don't think there's anything else to do here:
Snowflake sample tests are passing:
cloud in π falcon in β¦/ibis on ξ feat/snowflake-sample is π¦ v9.0.0 via π v3.12.2 via βοΈ impure (ibis-3.12.2-env) took 20s
β― pytest 'ibis/backends/tests/test_generic.py' -k sample -m snowflake -vv
======================================================================================== test session starts =========================================================================================
platform linux -- Python 3.12.2, pytest-8.2.0, pluggy-1.5.0 -- /nix/store/cdmvsa912j1xj2d6kfsv0b63ffnylkvv-python3-3.12.2-env/bin/python3.12
cachedir: .pytest_cache
hypothesis profile 'dev' -> deadline=None, max_examples=50, suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase(PosixPath('/home/cloud/ibis/.hypothesis/examples'))
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Using --randomly-seed=1209062198
rootdir: /home/cloud/ibis
configfile: pyproject.toml
plugins: hypothesis-6.100.2, benchmark-4.0.0, pytest_httpserver-1.0.10, clarity-1.0.1, randomly-3.15.0, repeat-0.9.3, timeout-2.3.1, xdist-3.6.1, cov-5.0.0, anyio-4.3.0, mock-3.14.0, snapshot-0.9.0
collected 3660 items / 3656 deselected / 4 selected
ibis/backends/tests/test_generic.py::test_sample[snowflake-block] XFAIL (SAMPLE clause on views only supports row wise sampling without seed.) [ 25%]
ibis/backends/tests/test_generic.py::test_sample_memtable[snowflake] PASSED [ 50%]
ibis/backends/tests/test_generic.py::test_sample[snowflake-row] PASSED [ 75%]
ibis/backends/tests/test_generic.py::test_sample_with_seed[snowflake] PASSED [100%]
That's awesome; thank you so much for helping me get this across the finish line!
- XPASS: these are tests that fail when they start succeeding. Have xfails in strict mode so that when tests that were previously failing start succeeding we know.
Ahh, that makes sense! I was digging around to see if I had forgotten to remove it/add it from/to an iterable somewhere.
This PR aims to add support for SAMPLE/TABLESAMPLE for Snowflake to build on the work done in #7377.
As the SAMPLE/TABLESAMPLE clause seems to be written the same way between Snowflake and DuckDB in SQL, I was able reuse code from: https://github.com/ibis-project/ibis/blob/b48f451de77bd9af832cea10d7ddf8c44097c45a/ibis/backends/duckdb/compiler.py#L118-L128
Snowflake Docs: SAMPLE/TABLESAMPLE
Code Snippet I used to validate it was working:
Results: