sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
61 stars 16 forks source link

Can't run "soda analyze" or "soda scan" on new installation - module not found error #338

Open dcape42 opened 2 years ago

dcape42 commented 2 years ago

Describe the bug On a new installation, I am able to run the soda command but I am unable to run any subcommands that access my data warehouse. I get an error saying Module sodasql.dialects.spark_dialect not found even though it is installed.

This is a shared project and I am attempting to use the warehouse and scan YAML files created by my colleague and shared in a git repo.

To Reproduce Steps to reproduce the behavior:

  1. In a fresh Python 3.8.12 environment, install soda packages:
    pip install soda-sql-spark
    pip install soda-spark
  2. Manually create ~/.soda/config.yml and ~/.soda/env_vars.yml with the values I need.
  3. Get warehouse.yml and scan.yml files from my team's git repository.
  4. Install the ODBC driver for my data warehouse and put that location in my warehouse.yml.
  5. Run soda --help to confirm Soda is installed - success.
  6. Run soda scan warehouse_riot_data.yml tables/accounts_alias.yml (or any YAML files) - this fails.

Expected result: Soda should run the scan command.

Actual result: Error

$ soda analyze ./warehouse_riot_dw.yml 
  | 2.1.8
  | Analyzing ./warehouse_riot_dw.yml ...
  | Module sodasql.dialects.spark_dialect not found. Are you sure you installed appropriate warehouse package?
  | Exception: 'NoneType' object is not callable
Traceback (most recent call last):
  File "/Users/dcapewell/.pyenv/versions/3.8.12/envs/test38/lib/python3.8/site-packages/sodasql/cli/cli.py", line 250, in analyze
    warehouse_yml_parser = WarehouseYmlParser(warehouse_yml_dict, warehouse_yml_file)
  File "/Users/dcapewell/.pyenv/versions/3.8.12/envs/test38/lib/python3.8/site-packages/sodasql/scan/warehouse_yml_parser.py", line 69, in __init__
    self.warehouse_yml.dialect = Dialect.create(self)
  File "/Users/dcapewell/.pyenv/versions/3.8.12/envs/test38/lib/python3.8/site-packages/sodasql/scan/dialect.py", line 109, in create
    return _warehouse_class(parser)
TypeError: 'NoneType' object is not callable
  | If you think this is a bug in Soda SQL, please open an issue athttps://github.com/sodadata/soda-sql/issues/new/choose
  | Starting new HTTPS connection (1): collect.soda.io:443
  | https://collect.soda.io:443 "POST /v1/traces HTTP/1.1" 200 0

Context This happens with any warehouse.yml file and any scan.yml file. It also happens when trying to run soda analyze with any warehouse.yml file. The contents of the YAML file don't matter because the failure occurs when trying to create a parser object.

I can confirm the python files are present under my site-packages directory with what appears to be the correct relative path.

site-packages/sodasql/scan/warehouse_yml_parser.py
site-packages/sodasql//dialects/spark_dialect.py

But the warehouse_yml_parser.py file is found and the spark_dialect.py file is not found (as shown by the module not found error).

I can also confirm the packages are installed at the correct versions.

$ pip list | grep soda
soda-spark                             0.3.1
soda-sql-core                          2.1.8
soda-sql-spark                         2.1.8

My colleague has run the same installation steps, and we are showing the same packages installed. He can run soda commands but I can't.

additional info: If I also install soda-sql-hive I am able to run soda create -f test.yml hive but not soda create -f test2.yml spark. So it seems there is something specific to the spark libraries.

OS: MacOS Big Sur 11.6.5 Python Version: 3.8.12 (also tried with 3.7.8 and 3.7.10) Soda SQL Version: 2.1.8 Warehouse Type: Spark

vijaykiran commented 2 years ago

Hi @dcape42 - unfortunately I am not able to reproduce it.. here's the log:

āžœ pyenv virtualenv spark-test
Looking in links: /var/folders/hr/rhtthkrs6kj15q_t8chz4_480000gn/T/tmph5voemw6
Requirement already satisfied: setuptools in /Users/vijay/.pyenv/versions/3.9.4/envs/spark-test/lib/python3.9/site-packages (49.2.1)
Requirement already satisfied: pip in /Users/vijay/.pyenv/versions/3.9.4/envs/spark-test/lib/python3.9/site-packages (20.2.3)

work/soda/datatalks took 4s
āžœ pyenv activate spark-test

work/soda/datatalks via šŸ v3.9.4 (spark-test)

work/soda/datatalks via šŸ v3.9.4 (spark-test)
āžœ pip install soda-spark
Collecting soda-spark
  Downloading soda_spark-0.3.1-py3-none-any.whl (10 kB)
Collecting pyspark~=3.0
  Downloading pyspark-3.2.1.tar.gz (281.4 MB)
     |ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 281.4 MB 16.3 MB/s
Collecting soda-sql-spark~=2.1
  Downloading soda_sql_spark-2.1.9-py3-none-any.whl (4.6 kB)
Collecting py4j==0.10.9.3
  Downloading py4j-0.10.9.3-py2.py3-none-any.whl (198 kB)
     |ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 198 kB 30.5 MB/s
Collecting soda-sql-core==2.1.9
  Downloading soda_sql_core-2.1.9-py3-none-any.whl (88 kB)
     |ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 88 kB 6.5 MB/s
Collecting thrift<1.0,>=0.13.0
  Downloading thrift-0.16.0.tar.gz (59 kB)
     |ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 59 kB 13.3 MB/s
Processing /Users/vijay/Library/Caches/pip/wheels/0b/8c/09/c6357b8b50c7ed462aca5b70958031fb121e94297b00048b36/PyHive-0.6.5-py3-none-any.whl
Collecting pyodbc<5.0,>=4.0
  Downloading pyodbc-4.0.32.tar.gz (280 kB)
     |ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 280 kB 37.8 MB/s
Collecting sasl<1.0,>=0.3.1
  Using cached sasl-0.3.1.tar.gz (44 kB)
Collecting thrift-sasl<1.0,>=0.4.3
  Using cached thrift_sasl-0.4.3-py2.py3-none-any.whl (8.3 kB)
Collecting click<9.0,>=8.0
  Downloading click-8.1.2-py3-none-any.whl (96 kB)
     |ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 96 kB 9.4 MB/s
Collecting requests<3.0,>=2.23.0
  Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB)
Collecting markupsafe==2.0.1
  Using cached MarkupSafe-2.0.1.tar.gz (18 kB)
Collecting Jinja2<3.0,>=2.11.3
  Using cached Jinja2-2.11.3-py2.py3-none-any.whl (125 kB)
Collecting Deprecated<1.3,>=1.2.13
  Using cached Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Collecting opentelemetry-exporter-otlp-proto-http<1.7,>=1.6.2
  Using cached opentelemetry_exporter_otlp_proto_http-1.6.2-py3-none-any.whl (13 kB)
Collecting pyyaml<6.0,>=5.4.1
  Using cached PyYAML-5.4.1.tar.gz (175 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Collecting six>=1.7.2
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Processing /Users/vijay/Library/Caches/pip/wheels/2f/a0/d3/4030d9f80e6b3be787f19fc911b8e7aa462986a40ab1e4bb94/future-0.18.2-py3-none-any.whl
Collecting python-dateutil
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Processing /Users/vijay/Library/Caches/pip/wheels/46/87/60/7fb7069c8135d081326017206a8481795191246b680e2ff67a/pure_sasl-0.6.2-py3-none-any.whl
Collecting charset-normalizer~=2.0.0; python_version >= "3"
  Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Collecting idna<4,>=2.5; python_version >= "3"
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.9-py2.py3-none-any.whl (138 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
Collecting wrapt<2,>=1.10
  Using cached wrapt-1.14.0.tar.gz (50 kB)
Collecting opentelemetry-proto==1.6.2
  Using cached opentelemetry_proto-1.6.2-py3-none-any.whl (63 kB)
Collecting opentelemetry-api~=1.3
  Using cached opentelemetry_api-1.10.0-py3-none-any.whl (47 kB)
Collecting opentelemetry-sdk~=1.3
  Using cached opentelemetry_sdk-1.10.0-py3-none-any.whl (70 kB)
Collecting backoff~=1.10.0
  Using cached backoff-1.10.0-py2.py3-none-any.whl (31 kB)
Collecting googleapis-common-protos~=1.52
  Using cached googleapis_common_protos-1.56.0-py2.py3-none-any.whl (241 kB)
Collecting protobuf>=3.13.0
  Downloading protobuf-3.20.0-py2.py3-none-any.whl (162 kB)
     |ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 162 kB 4.0 MB/s
Requirement already satisfied: setuptools>=16.0 in /Users/vijay/.pyenv/versions/3.9.4/envs/spark-test/lib/python3.9/site-packages (from opentelemetry-api~=1.3->opentelemetry-exporter-otlp-proto-http<1.7,>=1.6.2->soda-sql-core==2.1.9->soda-sql-spark~=2.1->soda-spark) (49.2.1)
Collecting typing-extensions>=3.7.4
  Using cached typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Collecting opentelemetry-semantic-conventions==0.29b0
  Using cached opentelemetry_semantic_conventions-0.29b0-py3-none-any.whl (25 kB)
Using legacy 'setup.py install' for pyspark, since package 'wheel' is not installed.
Using legacy 'setup.py install' for thrift, since package 'wheel' is not installed.
Using legacy 'setup.py install' for pyodbc, since package 'wheel' is not installed.
Using legacy 'setup.py install' for sasl, since package 'wheel' is not installed.
Using legacy 'setup.py install' for markupsafe, since package 'wheel' is not installed.
Using legacy 'setup.py install' for wrapt, since package 'wheel' is not installed.
Building wheels for collected packages: pyyaml
  Building wheel for pyyaml (PEP 517) ... done
  Created wheel for pyyaml: filename=PyYAML-5.4.1-cp39-cp39-macosx_11_0_x86_64.whl size=156280 sha256=943962f196bef2623dec0234a3c631407b233ab36813eb7e3691aa31aed9d8ce
  Stored in directory: /Users/vijay/Library/Caches/pip/wheels/b7/a5/c4/504d913c2a55bb09c607541578ec5f844d1ff33467abe93ba5
Successfully built pyyaml
Installing collected packages: py4j, pyspark, click, charset-normalizer, idna, urllib3, certifi, requests, markupsafe, Jinja2, wrapt, Deprecated, protobuf, opentelemetry-proto, opentelemetry-api, typing-extensions, opentelemetry-semantic-conventions, opentelemetry-sdk, backoff, googleapis-common-protos, opentelemetry-exporter-otlp-proto-http, pyyaml, soda-sql-core, six, thrift, future, python-dateutil, PyHive, pyodbc, sasl, pure-sasl, thrift-sasl, soda-sql-spark, soda-spark
    Running setup.py install for pyspark ... done
    Running setup.py install for markupsafe ... done
    Running setup.py install for wrapt ... done
    Running setup.py install for thrift ... done
    Running setup.py install for pyodbc ... done
    Running setup.py install for sasl ... done
Successfully installed Deprecated-1.2.13 Jinja2-2.11.3 PyHive-0.6.5 backoff-1.10.0 certifi-2021.10.8 charset-normalizer-2.0.12 click-8.1.2 future-0.18.2 googleapis-common-protos-1.56.0 idna-3.3 markupsafe-2.0.1 opentelemetry-api-1.10.0 opentelemetry-exporter-otlp-proto-http-1.6.2 opentelemetry-proto-1.6.2 opentelemetry-sdk-1.10.0 opentelemetry-semantic-conventions-0.29b0 protobuf-3.20.0 pure-sasl-0.6.2 py4j-0.10.9.3 pyodbc-4.0.32 pyspark-3.2.1 python-dateutil-2.8.2 pyyaml-5.4.1 requests-2.27.1 sasl-0.3.1 six-1.16.0 soda-spark-0.3.1 soda-sql-core-2.1.9 soda-sql-spark-2.1.9 thrift-0.16.0 thrift-sasl-0.4.3 typing-extensions-4.1.1 urllib3-1.26.9 wrapt-1.14.0
WARNING: You are using pip version 20.2.3; however, version 22.0.4 is available.
You should consider upgrading via the '/Users/vijay/.pyenv/versions/3.9.4/envs/spark-test/bin/python3.9 -m pip install --upgrade pip' command.
work/soda/datatalks via šŸ v3.9.4 (spark-test) took 44s

work/soda/datatalks via šŸ v3.9.4 (spark-test)
āÆ python --version
Python 3.9.4

work/soda/datatalks via šŸ v3.9.4 (spark-test)
āÆ soda create --help
Usage: soda create [OPTIONS] WAREHOUSE_TYPE

  Creates a new warehouse.yml file and prepares credentials in your
  ~/.soda/env_vars.yml Nothing will be overwritten or removed, only added if
  it does not exist yet.

  WAREHOUSE_TYPE is one of {postgres, snowflake, redshift, bigquery, athena}

Options:
  -f, --file TEXT       The destination filename for the warehouse
                        configuration details. This can be a relative path.
  -d, --database TEXT   The database name to use for the connection
  -u, --username TEXT   The username to use for the connection, through
                        env_var(...)
  -p, --password TEXT   The password to use for the connection, through
                        env_var(...)
  -w, --warehouse TEXT  The warehouse name
  --help                Show this message and exit.

work/soda/datatalks via šŸ v3.9.4 (spark-test)
āžœ soda create spark
  | Soda CLI version 2.1.9
  | Creating warehouse YAML file warehouse.yml ...
  | Adding env vars for spark to /Users/vijay/.soda/env_vars.yml
  | Review warehouse.yml by running command
  |   cat warehouse.yml
  | Review section spark in ~/.soda/env_vars.yml by running command
  |   cat ~/.soda/env_vars.yml
  | Then run the soda analyze command
  | Starting new HTTPS connection (1): collect.soda.io:443
  | [https://collect.soda.io:443](https://collect.soda.io/) "POST /v1/traces HTTP/1.1" 200 0
work/soda/datatalks via šŸ v3.9.4 (spark-test)

Can you please try only installing soda-spark ? and not both soda-spark-sql and soda-spark?

sivaveera commented 2 years ago

SOLVED: Current mac version had incompatibility with system wide odbc lib

Test : created a python test file test.py with import sodasql.dialects.spark_dialect

(.venv) sveera@SVEER1ML1 soda-test % python test.py Traceback (most recent call last): File "test.py", line 1, in import sodasql.dialects.spark_dialect File "/Users/sveera/soda-test/.venv/lib/python3.7/site-packages/sodasql/dialects/spark_dialect.py", line 13, in import pyodbc ImportError: dlopen(/Users/sveera/soda-test/.venv/lib/python3.7/site-packages/pyodbc.cpython-37m-darwin.so, 2): Library not loaded: /usr/local/opt/unixodbc/lib/libodbc.2.dylib Referenced from: /Users/sveera/soda-test/.venv/lib/python3.7/site-packages/pyodbc.cpython-37m-darwin.so Reason: image not found

Resolution:

  1. brew install unixodbc
  2. soda create spark. >> This should create warehouse.yml
  3. cat warehouse.yml