mycelial / snowflake-rs

Snowflake API libraries for Rust
Apache License 2.0
23 stars 18 forks source link

Larger queries hanging #29

Open GBagley opened 5 months ago

GBagley commented 5 months ago

Using the example from the snowflake-api README, I'm finding that relatively small queries are working fine, but when I run a larger query (larger in terms of resultset size) it seems to hang and never return with the results. I can see from the snowflake console that the query finishes successfully, relatively quickly (couple seconds), so shouldn't be a timeout issue.

Any ideas? Is there any known limitation to the 'undocumented' API like this? The same query is working fine through other tools like DataGrip (which I think is going through JDBC?)

Also had the same problem with another rust crate: snowflake-connector-rs, which I think is using the same undocumented API.

andrusha commented 4 months ago

Could you somehow make a reproducible example? Maybe some more details on expected result size (lines, bytes), what cloud you're deployed on, etc.

A set of queries will suffice, eg:

CREATE OR REPLACE TABLE ...;
INSERT INTO ...; -- with generator
SELECT FROM ...;

You can also try running the query with trace level set RUST_LOG=trace cargo run --example run_sql, which will give more information on what's happening under the hood.

milevin commented 3 months ago

Not sure if this is a related issue. The fork that I am using has problems with long-running queries. Here is an example query that takes 2 minutes in my warehouse. You can add/remove 0's to make it slower/faster as needed.

WITH RECURSIVE Compute_CTE AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1
    FROM Compute_CTE
    WHERE n < 10000000
)
SELECT SUM(n) 
FROM Compute_CTE
andrusha commented 3 months ago

@milevin I took a look and it appears that when Snowflake detects that query will run much longer than an http timeout it'll return a handle instead, which you can poll to get the result. This isn't implemented yet and it should've been erroring out as that type of response the library shouldn't be able to parse correctly.

I was looking to at least add proper parsing for that kind of reply, but supporting for async (on the snowflake side) queries is not that trivial.

If you want to take a shot I think https://github.com/mycelial/snowflake-rs/pull/38 is a good place to start, after you've gotten the handle it needs to be polled until the result is available, which will come in the same format as other results.

milevin commented 3 months ago

@andrusha, great. We will look into this. cc: @xuliangs

xuliangs commented 3 months ago

Not sure if this is a related issue. The fork that I am using has problems with long-running queries. Here is an example query that takes 2 minutes in my warehouse. You can add/remove 0's to make it slower/faster as needed.

WITH RECURSIVE Compute_CTE AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1
    FROM Compute_CTE
    WHERE n < 10000000
)
SELECT SUM(n) 
FROM Compute_CTE

Thanks @andrusha I worked a PR beforehand that handles this query. Maybe we can get this in first?