Open tharwan opened 9 months ago
Hi @tharwan this might be possible ... could you share example SQL statements you'd like support for, both creating the stage and loading from it?
Loading from a stage looks something like this:
COPY INTO my_table
FROM @my_stage/my_file
FILE_FORMAT = (FIELD_DELIMITER = ';')
Creating a stage is not so relevant in our case but looks like this:
CREATE STAGE my_ext_stage
URL='azure://myaccount.blob.core.windows.net/load/files/'
STORAGE_INTEGRATION = myint;
maybe related: https://github.com/tobymao/sqlglot/issues/2463
Thanks! From the above I gather you want support for CSV files.
I think creating the stage would be needed for fakesnow to know where to find the files to load.
another option could be to just look locally for a file and ignore the @my_stage part.
e.g.
COPY INTO my_table
FROM @my_stage/my_file
FILE_FORMAT = (FIELD_DELIMITER = ';')
would translate to
COPY my_table
FROM my_file
(FORMAT CSV, DELIMITER ';');
we recently found ourselves wanting this at $job. i may look into a relatively simple initial implementation somewhat soon. My personal plan/preference would be for fakesnow to simply internally track creations of stages inside information_schema.stages
and something else for integrations (i dont see a table for this, but we're using integrations personally).
And then translate COPY statements against the service in question into boto3 calls + insert statements (which ought to work for all the supported cloud providers iirc)
It'd then be the job of the user to set up moto
however they prefer, to avoid making calls to the actual service. Perhaps fakesnow could provide a default fixture that would ensure it'd avoid making calls by default, but i'd suspect we at least, would be forced to override it anyways.
like i said, i'll probably be looking into the feasibility of this strat soon, but i expect sqlglot to be the short term limiting factor in terms of minimally parsing the integration/stage/copy statement syntaxes respectively.
If the use-case were just testing then I thought for the first iteration of this we could just support a local file via a file:// url. It would avoid the need to set up actual s3/azure storage or moto to mock s3. Would that work?
my personal usecase requires a storage integration/s3, but presumably my implementing that will be easier if there was already general support against a staged file://
Yes good point - if we transform a COPY INTO into a INSERT INTO .. SELECT in duckdb, then we could support both a file:// and s3:// url via the same mechanism relatively easily, using duckdb's support for s3 via the https extension.
If you're keen to tackle this a PR would be welcome!
Hi,
Would it be possible to support loading of data from a stage? (https://docs.snowflake.com/en/user-guide/data-load-considerations-load)
I am not sure from the docs if this would be even inside the scope of this project.
However it would help us a lot to test our complete ETL workflow.