Open dcape42 opened 2 years ago
Hi @dcape42 - unfortunately I am not able to reproduce it.. here's the log:
ā pyenv virtualenv spark-test
Looking in links: /var/folders/hr/rhtthkrs6kj15q_t8chz4_480000gn/T/tmph5voemw6
Requirement already satisfied: setuptools in /Users/vijay/.pyenv/versions/3.9.4/envs/spark-test/lib/python3.9/site-packages (49.2.1)
Requirement already satisfied: pip in /Users/vijay/.pyenv/versions/3.9.4/envs/spark-test/lib/python3.9/site-packages (20.2.3)
work/soda/datatalks took 4s
ā pyenv activate spark-test
work/soda/datatalks via š v3.9.4 (spark-test)
work/soda/datatalks via š v3.9.4 (spark-test)
ā pip install soda-spark
Collecting soda-spark
Downloading soda_spark-0.3.1-py3-none-any.whl (10 kB)
Collecting pyspark~=3.0
Downloading pyspark-3.2.1.tar.gz (281.4 MB)
|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 281.4 MB 16.3 MB/s
Collecting soda-sql-spark~=2.1
Downloading soda_sql_spark-2.1.9-py3-none-any.whl (4.6 kB)
Collecting py4j==0.10.9.3
Downloading py4j-0.10.9.3-py2.py3-none-any.whl (198 kB)
|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 198 kB 30.5 MB/s
Collecting soda-sql-core==2.1.9
Downloading soda_sql_core-2.1.9-py3-none-any.whl (88 kB)
|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 88 kB 6.5 MB/s
Collecting thrift<1.0,>=0.13.0
Downloading thrift-0.16.0.tar.gz (59 kB)
|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 59 kB 13.3 MB/s
Processing /Users/vijay/Library/Caches/pip/wheels/0b/8c/09/c6357b8b50c7ed462aca5b70958031fb121e94297b00048b36/PyHive-0.6.5-py3-none-any.whl
Collecting pyodbc<5.0,>=4.0
Downloading pyodbc-4.0.32.tar.gz (280 kB)
|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 280 kB 37.8 MB/s
Collecting sasl<1.0,>=0.3.1
Using cached sasl-0.3.1.tar.gz (44 kB)
Collecting thrift-sasl<1.0,>=0.4.3
Using cached thrift_sasl-0.4.3-py2.py3-none-any.whl (8.3 kB)
Collecting click<9.0,>=8.0
Downloading click-8.1.2-py3-none-any.whl (96 kB)
|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 96 kB 9.4 MB/s
Collecting requests<3.0,>=2.23.0
Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB)
Collecting markupsafe==2.0.1
Using cached MarkupSafe-2.0.1.tar.gz (18 kB)
Collecting Jinja2<3.0,>=2.11.3
Using cached Jinja2-2.11.3-py2.py3-none-any.whl (125 kB)
Collecting Deprecated<1.3,>=1.2.13
Using cached Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Collecting opentelemetry-exporter-otlp-proto-http<1.7,>=1.6.2
Using cached opentelemetry_exporter_otlp_proto_http-1.6.2-py3-none-any.whl (13 kB)
Collecting pyyaml<6.0,>=5.4.1
Using cached PyYAML-5.4.1.tar.gz (175 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Collecting six>=1.7.2
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Processing /Users/vijay/Library/Caches/pip/wheels/2f/a0/d3/4030d9f80e6b3be787f19fc911b8e7aa462986a40ab1e4bb94/future-0.18.2-py3-none-any.whl
Collecting python-dateutil
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Processing /Users/vijay/Library/Caches/pip/wheels/46/87/60/7fb7069c8135d081326017206a8481795191246b680e2ff67a/pure_sasl-0.6.2-py3-none-any.whl
Collecting charset-normalizer~=2.0.0; python_version >= "3"
Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Collecting idna<4,>=2.5; python_version >= "3"
Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting urllib3<1.27,>=1.21.1
Using cached urllib3-1.26.9-py2.py3-none-any.whl (138 kB)
Collecting certifi>=2017.4.17
Using cached certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
Collecting wrapt<2,>=1.10
Using cached wrapt-1.14.0.tar.gz (50 kB)
Collecting opentelemetry-proto==1.6.2
Using cached opentelemetry_proto-1.6.2-py3-none-any.whl (63 kB)
Collecting opentelemetry-api~=1.3
Using cached opentelemetry_api-1.10.0-py3-none-any.whl (47 kB)
Collecting opentelemetry-sdk~=1.3
Using cached opentelemetry_sdk-1.10.0-py3-none-any.whl (70 kB)
Collecting backoff~=1.10.0
Using cached backoff-1.10.0-py2.py3-none-any.whl (31 kB)
Collecting googleapis-common-protos~=1.52
Using cached googleapis_common_protos-1.56.0-py2.py3-none-any.whl (241 kB)
Collecting protobuf>=3.13.0
Downloading protobuf-3.20.0-py2.py3-none-any.whl (162 kB)
|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 162 kB 4.0 MB/s
Requirement already satisfied: setuptools>=16.0 in /Users/vijay/.pyenv/versions/3.9.4/envs/spark-test/lib/python3.9/site-packages (from opentelemetry-api~=1.3->opentelemetry-exporter-otlp-proto-http<1.7,>=1.6.2->soda-sql-core==2.1.9->soda-sql-spark~=2.1->soda-spark) (49.2.1)
Collecting typing-extensions>=3.7.4
Using cached typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Collecting opentelemetry-semantic-conventions==0.29b0
Using cached opentelemetry_semantic_conventions-0.29b0-py3-none-any.whl (25 kB)
Using legacy 'setup.py install' for pyspark, since package 'wheel' is not installed.
Using legacy 'setup.py install' for thrift, since package 'wheel' is not installed.
Using legacy 'setup.py install' for pyodbc, since package 'wheel' is not installed.
Using legacy 'setup.py install' for sasl, since package 'wheel' is not installed.
Using legacy 'setup.py install' for markupsafe, since package 'wheel' is not installed.
Using legacy 'setup.py install' for wrapt, since package 'wheel' is not installed.
Building wheels for collected packages: pyyaml
Building wheel for pyyaml (PEP 517) ... done
Created wheel for pyyaml: filename=PyYAML-5.4.1-cp39-cp39-macosx_11_0_x86_64.whl size=156280 sha256=943962f196bef2623dec0234a3c631407b233ab36813eb7e3691aa31aed9d8ce
Stored in directory: /Users/vijay/Library/Caches/pip/wheels/b7/a5/c4/504d913c2a55bb09c607541578ec5f844d1ff33467abe93ba5
Successfully built pyyaml
Installing collected packages: py4j, pyspark, click, charset-normalizer, idna, urllib3, certifi, requests, markupsafe, Jinja2, wrapt, Deprecated, protobuf, opentelemetry-proto, opentelemetry-api, typing-extensions, opentelemetry-semantic-conventions, opentelemetry-sdk, backoff, googleapis-common-protos, opentelemetry-exporter-otlp-proto-http, pyyaml, soda-sql-core, six, thrift, future, python-dateutil, PyHive, pyodbc, sasl, pure-sasl, thrift-sasl, soda-sql-spark, soda-spark
Running setup.py install for pyspark ... done
Running setup.py install for markupsafe ... done
Running setup.py install for wrapt ... done
Running setup.py install for thrift ... done
Running setup.py install for pyodbc ... done
Running setup.py install for sasl ... done
Successfully installed Deprecated-1.2.13 Jinja2-2.11.3 PyHive-0.6.5 backoff-1.10.0 certifi-2021.10.8 charset-normalizer-2.0.12 click-8.1.2 future-0.18.2 googleapis-common-protos-1.56.0 idna-3.3 markupsafe-2.0.1 opentelemetry-api-1.10.0 opentelemetry-exporter-otlp-proto-http-1.6.2 opentelemetry-proto-1.6.2 opentelemetry-sdk-1.10.0 opentelemetry-semantic-conventions-0.29b0 protobuf-3.20.0 pure-sasl-0.6.2 py4j-0.10.9.3 pyodbc-4.0.32 pyspark-3.2.1 python-dateutil-2.8.2 pyyaml-5.4.1 requests-2.27.1 sasl-0.3.1 six-1.16.0 soda-spark-0.3.1 soda-sql-core-2.1.9 soda-sql-spark-2.1.9 thrift-0.16.0 thrift-sasl-0.4.3 typing-extensions-4.1.1 urllib3-1.26.9 wrapt-1.14.0
WARNING: You are using pip version 20.2.3; however, version 22.0.4 is available.
You should consider upgrading via the '/Users/vijay/.pyenv/versions/3.9.4/envs/spark-test/bin/python3.9 -m pip install --upgrade pip' command.
work/soda/datatalks via š v3.9.4 (spark-test) took 44s
work/soda/datatalks via š v3.9.4 (spark-test)
āÆ python --version
Python 3.9.4
work/soda/datatalks via š v3.9.4 (spark-test)
āÆ soda create --help
Usage: soda create [OPTIONS] WAREHOUSE_TYPE
Creates a new warehouse.yml file and prepares credentials in your
~/.soda/env_vars.yml Nothing will be overwritten or removed, only added if
it does not exist yet.
WAREHOUSE_TYPE is one of {postgres, snowflake, redshift, bigquery, athena}
Options:
-f, --file TEXT The destination filename for the warehouse
configuration details. This can be a relative path.
-d, --database TEXT The database name to use for the connection
-u, --username TEXT The username to use for the connection, through
env_var(...)
-p, --password TEXT The password to use for the connection, through
env_var(...)
-w, --warehouse TEXT The warehouse name
--help Show this message and exit.
work/soda/datatalks via š v3.9.4 (spark-test)
ā soda create spark
| Soda CLI version 2.1.9
| Creating warehouse YAML file warehouse.yml ...
| Adding env vars for spark to /Users/vijay/.soda/env_vars.yml
| Review warehouse.yml by running command
| cat warehouse.yml
| Review section spark in ~/.soda/env_vars.yml by running command
| cat ~/.soda/env_vars.yml
| Then run the soda analyze command
| Starting new HTTPS connection (1): collect.soda.io:443
| [https://collect.soda.io:443](https://collect.soda.io/) "POST /v1/traces HTTP/1.1" 200 0
work/soda/datatalks via š v3.9.4 (spark-test)
Can you please try only installing soda-spark ? and not both soda-spark-sql and soda-spark?
SOLVED: Current mac version had incompatibility with system wide odbc lib
(.venv) sveera@SVEER1ML1 soda-test % python test.py
Traceback (most recent call last):
File "test.py", line 1, in
Resolution:
Describe the bug On a new installation, I am able to run the soda command but I am unable to run any subcommands that access my data warehouse. I get an error saying
Module sodasql.dialects.spark_dialect not found
even though it is installed.This is a shared project and I am attempting to use the warehouse and scan YAML files created by my colleague and shared in a git repo.
To Reproduce Steps to reproduce the behavior:
~/.soda/config.yml
and~/.soda/env_vars.yml
with the values I need.soda --help
to confirm Soda is installed - success.soda scan warehouse_riot_data.yml tables/accounts_alias.yml
(or any YAML files) - this fails.Expected result: Soda should run the scan command.
Actual result: Error
Context This happens with any warehouse.yml file and any scan.yml file. It also happens when trying to run
soda analyze
with any warehouse.yml file. The contents of the YAML file don't matter because the failure occurs when trying to create a parser object.I can confirm the python files are present under my site-packages directory with what appears to be the correct relative path.
But the warehouse_yml_parser.py file is found and the spark_dialect.py file is not found (as shown by the module not found error).
I can also confirm the packages are installed at the correct versions.
My colleague has run the same installation steps, and we are showing the same packages installed. He can run soda commands but I can't.
additional info: If I also install
soda-sql-hive
I am able to runsoda create -f test.yml hive
but notsoda create -f test2.yml spark
. So it seems there is something specific to the spark libraries.OS: MacOS Big Sur 11.6.5 Python Version: 3.8.12 (also tried with 3.7.8 and 3.7.10) Soda SQL Version: 2.1.8 Warehouse Type: Spark