ibis-project / ibis-ml

IbisML is a library for building scalable ML pipelines using Ibis.
https://ibis-project.github.io/ibis-ml/
Apache License 2.0
96 stars 13 forks source link

build: make NumPy, pandas, and Arrow deps optional #152

Closed jitingxu1 closed 2 months ago

jitingxu1 commented 2 months ago

For Ibis default installation, it removed numpy and pandas as required dependencies: https://github.com/ibis-project/ibis/pull/9564

I did not have experience with package dependency managment, I have two options

  1. add pandas and numpy as required dependency in IbisML
  2. add a backend (duckdb) for ibis installation, anyhow, we need to install one backend (duckdb) to run IbisML

I have no idea which one is better, or there is other better ones,

I took option 2, please take a look @lostmygithubaccount @deepyaman

If it is not urgent, we could wait for Deepyaman back.

----update---

Final decision is to make numpy, pandas, pyarrow imported in the function scope

resolve #151

codecov-commenter commented 2 months ago

Codecov Report

Attention: Patch coverage is 82.14286% with 5 lines in your changes missing coverage. Please review.

Project coverage is 85.24%. Comparing base (2f2aaaf) to head (6c7f9f1). Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
ibis_ml/core.py 80.00% 3 Missing :warning:
ibis_ml/__init__.py 0.00% 1 Missing :warning:
tests/test_optional_dependencies.py 90.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #152 +/- ## ========================================== + Coverage 84.94% 85.24% +0.29% ========================================== Files 26 27 +1 Lines 1920 1938 +18 ========================================== + Hits 1631 1652 +21 + Misses 289 286 -3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

deepyaman commented 2 months ago

We should not force the user to install the DuckDB backend, when IbisML can work without it.

Option 1 is a better answer, but it still forces the user to install NumPy and pandas; do they need these dependencies unless they are using pandas or NumPy inputs or scikit-learn?

The correct answer is probably to move the imports into the scope where they're used (see how it's done for something like Polars).

jitingxu1 commented 2 months ago

local import numpy pandas pyarrow

lostmygithubaccount commented 2 months ago

should we be raising something if the relevant libraries aren't installed? or is the import error a user would get fine? idk what the standard practice would be

deepyaman commented 2 months ago

should we be raising something if the relevant libraries aren't installed? or is the import error a user would get fine? idk what the standard practice would be

Import error is fine IMO for now. I'm not sure how far we're getting realistically without a backend.

jitingxu1 commented 2 months ago

should we be raising something if the relevant libraries aren't installed? or is the import error a user would get fine? idk what the standard practice would be

Added tests: