sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
61 stars 16 forks source link

Split soda-sql in separate modules #226

Closed tombaeyens closed 3 years ago

tombaeyens commented 3 years ago

Folder structure to be discussed

Libraries use dashes in the names, no underscores. Folder names as indicated below all use underscores.

+ src
    + sodasql (why was this again?)
        + setup.py
    + sodasql_cloud
        + setup.py
        + sodasql_cloud
        + tests
    + sodasql_scan
        + setup.py
        + sodasql_scan
        + tests
    + sodasql_cli
        + setup.py
        + sodasql_cli
        + tests
    + sodasql_airflow
    + connectors
        + sodasql_postgres
            + setup.py
            + sodasql_postgres
            + tests
        + sodasql_snowflake
        + sodasql_athena
        + sodasql_redshift
        + sodasql_bigquery
        + sodasql_spark

Also, before starting I would like to understand how we'll deal with versioning the individual libs and how this impacts the release process.

TODO investigate if dialects can be extracted. Passing data from core lib to dialects is no prob. But The dialects should not invoke methods on core stuff. That would cause circular dependencies. To be investigated before starting this.

dirkgroenen commented 3 years ago
  • sodasql (why was this again?)

Because you can then refer to it as a meta package in for example your 5 min tutorial. It would basically install a default subset of modules, like eg the postgres connector.

You can go without the package but then your 5 min tut would state something like:

Install pip install soda-sql-base soda-sql-postgres-connector soda-sql-snowflake-connector ...

vs

Install pip install soda-sql, which installs the base package along with for example the postgres connector.

vijaykiran commented 3 years ago

I think the DX should be more like: pip install soda-sql[all] to install everything and pip-install soda-sql[posgres] or `pip install soda-sql[snowflake]' etc. The AFAIK we can do this using using setuptools extras.

milanaleksic commented 3 years ago

My 2 cents: I am asking myself who would need ever to download all the warehouse supports? I thinks that’s a degenerate case that we should maybe not optimise for by doing it with pip install soda-sql. I agree with Vinjay on this: perhaps better to leave soda-sql as core dependency only and as people to fetch warehouse they wish to use.

dirkgroenen commented 3 years ago

As discussed during Engineering Meeting:

tombaeyens commented 3 years ago

A prospect today mentioned we should consider binary packaging. Did we evaluate that option? If we support that, it seemed to give him more confidence that our tool would not clash with other tools like the AWS CLI.

vijaykiran commented 3 years ago

In my opinion, binary packaging would mean we have to bundle all the modules together, not sure if that's a worthy goal to pursue because there will be significant amount of dependencies that many users might not need/use (e.g., people using redshift, are also getting all other deps). OTOH, since they will be isolated, it is only going to increase the payload size rather than conflicting with the python environment.

For generating installer, we can use http://www.pyinstaller.org

Splitting code into modules shouldn't be causing any issues with this goal though.

tombaeyens commented 3 years ago

Meeting notes: we discussed and will initially go for splitting up the modules only and not pursue the binary packaging route that was mentioned.

tombaeyens commented 3 years ago

@vijaykiran I m closing sodadata/soda-sql#314 (install fails on Python 3.9.2) as a duplicate of this issue. If that is a wrong assumption, please reopen it.

vijaykiran commented 3 years ago

Done!