snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
568 stars 458 forks source link

SNOW-1208964: Fix source file path resolution for local storage client #1896

Closed sfc-gh-psanford closed 4 months ago

sfc-gh-psanford commented 4 months ago

Description

When running GET queries in a local deployment, if a prefix path is provided in the GET query, the connector doesn't correctly generate the source file location. See SNOW-1208964 for a detailed example.

Testing

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1208964

  2. Fill out the following pre-review checklist:

    • [ ] I am adding a new automated test(s) to verify correctness of my new code
    • [ ] I am adding new logging messages
    • [ ] I am adding a new telemetry message
    • [ ] I am modifying authorization mechanisms
    • [ ] I am adding new credentials
    • [ ] I am modifying OCSP code
    • [ ] I am adding a new dependency
  3. Please describe how your code solves the related issue.

When creating a destination filepath for PUT, the connector uses only the basename of the file. This works even with subdirectories because in the PUT case, stage_info["location"] contains the subdirectory path.

For GET, the connector is using the same logic to construct the source filepath, but the prefix is not included in stage_info["location"]. This PR instead constructs the source filename based on real_src_file_name in the GET case.

I also changed shutil.copyfile to shutil.move in finish_download - this was leaving <filename>.part files in the destination directory.

github-actions[bot] commented 4 months ago

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

sfc-gh-yuwang commented 4 months ago

Hi, Can you also post your test code here?

sfc-gh-psanford commented 4 months ago

Here's a repro script:

import tempfile

import snowflake.connector
import os

home_path = os.getenv("HOME")
data_path = os.path.join(home_path, "<path to repo>/ExecPlatform/Database/data/orders_10*.csv")
print(f"Using data path {data_path}")

conn = snowflake.connector.connect(<local regression test instance connection details>)
try:
    conn.cursor().execute("create database local_transfer_test")
    conn.cursor().execute("use local_transfer_test")
    conn.cursor().execute("use schema public")
    conn.cursor().execute("create stage local_stage")
    conn.cursor().execute(f"PUT file://{data_path} @local_stage/subdir")
    res = conn.cursor().execute("ls @local_stage").get_result_batches()
    for batch in res:
        for row in batch.create_iter():
            print(row)
    tmp_path = tempfile.mkdtemp()
    print(f"Tmp path: {tmp_path}")
    conn.cursor().execute(f"GET @local_stage/subdir file://{tmp_path}")
finally:
    conn.cursor().execute("drop database local_transfer_test")
sfc-gh-psanford commented 4 months ago

I have read the CLA Document and I hereby sign the CLA