ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
4.36k stars 540 forks source link

feat(polars): initial support for polars backend #4566

Closed ritchie46 closed 1 year ago

ritchie46 commented 1 year ago

Alright, there is still much to do, but I bounced at this point because I cannot run the tests and therefore not really introspect the types/code.

When I try to run pytest, I get this error:

(venv) (base) ritchie46:~/Downloads/ibis/ibis/backends/polars/tests:  (master)$ pytest conftest.py 
ImportError while loading conftest '/home/ritchie46/Downloads/ibis/ibis/backends/conftest.py'.
../../../__init__.py:6: in <module>
    import ibis.backends.pandas
../../pandas/__init__.py:14: in <module>
    from ibis.backends.base import BaseBackend
../../base/__init__.py:645: in <module>
    sorted(_get_backend_names().difference(("duckdb", "sqlite", "pyspark")))
../../base/__init__.py:638: in _get_backend_names
    entrypoints = importlib.metadata.entry_points()["ibis.backends"]
E   KeyError: 'ibis.backends'

Any idea?

cpcloud commented 1 year ago

@ritchie46 Sweet! Looking 👀 now

github-actions[bot] commented 1 year ago

Test Results

       39 files         39 suites   1h 23m 4s :stopwatch: 10 758 tests   8 356 :heavy_check_mark: 2 402 :zzz: 0 :x: 39 497 runs  30 215 :heavy_check_mark: 9 282 :zzz: 0 :x:

Results for commit c94ae9e4.

:recycle: This comment has been updated with latest results.

codecov[bot] commented 1 year ago

Codecov Report

Merging #4566 (397eaca) into master (ca12040) will decrease coverage by 0.31%. The diff coverage is 94.28%.

:exclamation: Current head 397eaca differs from pull request most recent head 0a4baa5. Consider uploading reports for the commit 0a4baa5 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4566      +/-   ##
==========================================
- Coverage   92.37%   92.06%   -0.32%     
==========================================
  Files         189      192       +3     
  Lines       20661    21338     +677     
  Branches     2855     2944      +89     
==========================================
+ Hits        19086    19645     +559     
- Misses       1181     1282     +101     
- Partials      394      411      +17     
Impacted Files Coverage Δ
ibis/expr/operations/generic.py 100.00% <ø> (ø)
ibis/expr/rules.py 87.74% <50.00%> (-0.51%) :arrow_down:
ibis/backends/polars/__init__.py 86.40% <86.40%> (ø)
ibis/backends/polars/datatypes.py 89.47% <89.47%> (ø)
ibis/backends/polars/compiler.py 96.42% <96.42%> (ø)
ibis/backends/datafusion/__init__.py 77.77% <100.00%> (ø)
ibis/backends/pyarrow/datatypes.py 81.81% <100.00%> (ø)
ibis/expr/operations/numeric.py 100.00% <100.00%> (ø)
ibis/expr/operations/strings.py 100.00% <100.00%> (ø)
ibis/backends/snowflake/datatypes.py 36.36% <0.00%> (-53.04%) :arrow_down:
... and 4 more
cpcloud commented 1 year ago

@ritchie46 Great starting point.

I'm working on getting the nix builds to pass.

In the meantime I've pushed up a change to the ibis-backends.yml workflow to run the polars backend tests.

I'll start reviewing the patch here in a bit.

cpcloud commented 1 year ago

Regarding entry points: You'll need to pip install -e . for that to work (or pip install . but that'll have to be run after every change so I'd recommend against that).

cpcloud commented 1 year ago

Working on the nix builds, but waiting for the release build to finish. lto = "fat" takes a bit of time 😀

ritchie46 commented 1 year ago

Working on the nix builds, but waiting for the release build to finish. lto = "fat" takes a bit of time grinning

You are compiling yourself? Yeap, that takes sometime. :see_no_evil:

ritchie46 commented 1 year ago

@cpcloud Any way I can easily generate the test data?

cpcloud commented 1 year ago

@cpcloud Any way I can easily generate the test data?

Ah, right. I don't think there is. In theory you could take all or most of what's in the DataFusion conftest.py file. Let me know if that doesn't work.

ritchie46 commented 1 year ago

Yeah, I miss the ibis/ci/ibis-testing-data/functional_alltypes.csv'.

ritchie46 commented 1 year ago

@cpcloud I have pushed some more, FYI so that you don't run into merge clashes.

I am stuck on a ibis.expr.datatypes.core.String is not a valid datatype.

client = <ibis.backends.polars.Backend object at 0x7fabc1cfa520>

    @pytest.fixture(scope='session')
    def alltypes(client):
>       return client.table("functional_alltypes")

tests/conftest.py:69: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
__init__.py:83: in table
    schema = polars_schema_to_ibis(lf.schema)
client.py:54: in polars_schema_to_ibis
    return ibis.schema({name: _polars_to_ibis[dtype] for (name, dtype) in schema.items()})
../../expr/api.py:334: in schema
    return sch.schema(pairs)
../../../venv/lib/python3.9/site-packages/multipledispatch/dispatcher.py:278: in __call__
    return func(*args, **kwargs)
../../expr/schema.py:405: in schema_from_mapping
    return Schema.from_dict(d)
../../expr/schema.py:207: in from_dict
    return cls(names, types)
../../common/grounds.py:25: in __call__
    return cls.__create__(*args, **kwargs)
../../common/grounds.py:111: in __create__
    kwargs = cls.__signature__.validate(*args, **kwargs)
../../common/validators.py:117: in validate
    this[name] = param.validate(value, this=this)
../../common/validators.py:94: in validate
    return self._validator(arg, this=this)
../../../venv/lib/python3.9/site-packages/toolz/functoolz.py:304: in __call__
    return self._partial(*args, **kwargs)
../../common/validators.py:231: in container_of
    return type(inner(item, **kwargs) for item in arg)
../../common/validators.py:231: in <genexpr>
    return type(inner(item, **kwargs) for item in arg)
../../../venv/lib/python3.9/site-packages/toolz/functoolz.py:304: in __call__
    return self._partial(*args, **kwargs)
../../expr/schema.py:47: in datatype
    return dt.dtype(arg)
../../../venv/lib/python3.9/site-packages/multipledispatch/dispatcher.py:278: in __call__
    return func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

value = <class 'ibis.expr.datatypes.core.String'>, kwargs = {}

    @dtype.register(object)
    def default(value, **kwargs) -> DataType:
>       raise IbisTypeError(f'Value {value!r} is not a valid datatype')
E       ibis.common.exceptions.IbisTypeError: Value <class 'ibis.expr.datatypes.core.String'> is not a valid datatype
kszucs commented 1 year ago

We need to pass around datatype instances instead of datatype classes. Let me fix that, going to push in a bit.

kszucs commented 1 year ago

Things should be a bit more fluent now. The ibis backend test suite can be run for polars using

pytest -sv -m polars ibis/backends/tests/test_generic.py

The test errors should clearly indicate the missing operation implementations.

ritchie46 commented 1 year ago

Things should be a bit more fluent now. The ibis backend test suite can be run for polars using

pytest -sv -m polars ibis/backends/tests/test_generic.py

The test errors should clearly indicate the missing operation implementations.

Thanks! Will give it another shot.

kszucs commented 1 year ago

If you don't mind I squashed the work so far into a single commit, otherwise the rebase is rather painful.

ritchie46 commented 1 year ago

If you don't mind I squashed the work so far into a single commit, otherwise the rebase is rather painful.

Not at all! I am not familiar enough with the ibis internals to make the best steps here, so do a s you please.

Anything missing upstream?

kszucs commented 1 year ago

@cpcloud it should be ready. I had to pin the latest polars version which broke the nix builds, could you please fix them?

cpcloud commented 1 year ago

@kszucs Yep, looking now.

ritchie46 commented 1 year ago

Thanks for all the help on this!