ibis-project / ibis-ml

IbisML is a library for building scalable ML pipelines using Ibis.
https://ibis-project.github.io/ibis-ml/
Apache License 2.0
81 stars 13 forks source link

Quickstart seems broken #133

Open koaning opened 4 weeks ago

koaning commented 4 weeks ago

I am trying to get the tutorial running locally but seem to hit an issue with the first cell block.

import ibis

con = ibis.connect("duckdb://nycflights13.ddb")
con.create_table(
    "flights", ibis.examples.nycflights13_flights.fetch().to_pyarrow(), overwrite=True
)
con.create_table(
    "weather", ibis.examples.nycflights13_weather.fetch().to_pyarrow(), overwrite=True
)

When I run it I get this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], line 5
      1 import ibis
      3 con = ibis.connect("duckdb://nycflights13.ddb")
      4 con.create_table(
----> 5     "flights", ibis.examples.nycflights13_flights.fetch().to_pyarrow(), overwrite=True
      6 )
      7 con.create_table(
      8     "weather", ibis.examples.nycflights13_weather.fetch().to_pyarrow(), overwrite=True
      9 )

File ~/Development/probabl/venv/lib/python3.11/site-packages/ibis/examples/__init__.py:45, in Example.fetch(self, table_name, backend)
     41     table_name = name
     43 board = _get_board()
---> 45 (path,) = board.pin_download(name)
     47 if backend.name in _DIRECT_BACKENDS:
     48     # Read directly into these backends. This helps reduce memory
     49     # usage, making the larger example datasets easier to work with.
     50     if path.endswith(".parquet"):

File [~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py:394](http://localhost:8888/lab/tree/~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py#line=393), in BaseBoard.pin_download(self, name, version, hash)
    376 def pin_download(self, name, version=None, hash=None) -> Sequence[str]:
    377     """Download the files contained in a pin.
    378 
    379     This method only downloads the files in a pin. In order to read and load
   (...)
    391 
    392     """
--> 394     meta = self.pin_fetch(name, version)
    396     if hash is not None:
    397         raise NotImplementedError("TODO: validate hash")

File [~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py:188](http://localhost:8888/lab/tree/~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py#line=187), in BaseBoard.pin_fetch(self, name, version)
    187 def pin_fetch(self, name: str, version: Optional[str] = None) -> Meta:
--> 188     meta = self.pin_meta(name, version)
    190     # TODO: sanity check caching (since R pins does a cache touch here)
    191     # path = self.construct_path([self.board, name, version])
    192     # self.fs.get(...)
   (...)
    195     #       need to ensure user can have a readable cache
    196     #       so they could pin_fetch and then examine the result, a la pin_download
    197     return meta

File [~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py:151](http://localhost:8888/lab/tree/~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py#line=150), in BaseBoard.pin_meta(self, name, version)
    148     selected_version = guess_version(version)
    149 else:
    150     # otherwise, get the last pin version
--> 151     versions = self.pin_versions(name, as_df=False)
    153     if not len(versions):
    154         raise NotImplementedError("TODO: sanity check when no versions")

File [~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py:106](http://localhost:8888/lab/tree/~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py#line=105), in BaseBoard.pin_versions(self, name, as_df)
    104 all_versions = []
    105 for full_path in versions_raw:
--> 106     version = self.keep_final_path_component(full_path)
    107     all_versions.append(guess_version(version))
    109 # sort them, with latest last

File [~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py:635](http://localhost:8888/lab/tree/~/Development/probabl/venv/lib/python3.11/site-packages/pins/boards.py#line=634), in BaseBoard.keep_final_path_component(self, path)
    634 def keep_final_path_component(self, path):
--> 635     return path.split("[/](http://localhost:8888/)")[-1]

AttributeError: 'dict' object has no attribute 'split'
koaning commented 4 weeks ago

These are my versions.

ibis-framework==9.3.0
ibis-ml==0.1.2
koaning commented 4 weeks ago

If possible, I might recommend hosting a csv on Github that one can just pull down locally. It seems that there are many libraries in between of getting this downloaded and actually getting the tutorial working. I have had to install an extra dependency, update my SSL certificate and still cannot seem to get the data in order to get started.

deepyaman commented 4 weeks ago

Hey @koaning. Thanks for giving IbisML a try!

Based on your error message, it looks like you're running into an issue fetching the example data Ibis provides. Did you use the install command from the tutorial: pip install 'ibis-framework[duckdb,examples]' ibis-ml scikit-learn?

I just tried this on my end in a fresh 3.12 Conda environment, and I wasn't able to replicate your issue. This is what was installed for me:

Installing collected packages: pytz, appdirs, zipp, xxhash, urllib3, tzdata, typing-extensions, toolz, threadpoolctl, sqlglot, six, pyyaml, pygments, pyasn1, pyarrow-hotfix, protobuf, parsy, oauthlib, numpy, multidict, mdurl, MarkupSafe, joblib, importlib-resources, idna, humanize, google-crc32c, fsspec, frozenlist, duckdb, decorator, charset-normalizer, certifi, cachetools, attrs, atpublic, aiohappyeyeballs, yarl, scipy, rsa, requests, python-dateutil, pyasn1-modules, pyarrow, proto-plus, markdown-it-py, jinja2, importlib-metadata, googleapis-common-protos, google-resumable-media, aiosignal, scikit-learn, rich, requests-oauthlib, pandas, ibis-framework, google-auth, aiohttp, pins, ibis-ml, google-auth-oauthlib, google-api-core, google-cloud-core, google-cloud-storage, gcsfs
Successfully installed MarkupSafe-2.1.5 aiohappyeyeballs-2.3.7 aiohttp-3.10.4 aiosignal-1.3.1 appdirs-1.4.4 atpublic-5.0 attrs-24.2.0 cachetools-5.5.0 certifi-2024.7.4 charset-normalizer-3.3.2 decorator-5.1.1 duckdb-1.0.0 frozenlist-1.4.1 fsspec-2024.6.1 gcsfs-2024.6.1 google-api-core-2.19.1 google-auth-2.34.0 google-auth-oauthlib-1.2.1 google-cloud-core-2.4.1 google-cloud-storage-2.18.2 google-crc32c-1.5.0 google-resumable-media-2.7.2 googleapis-common-protos-1.63.2 humanize-4.10.0 ibis-framework-9.3.0 ibis-ml-0.1.2 idna-3.7 importlib-metadata-8.2.0 importlib-resources-6.4.3 jinja2-3.1.4 joblib-1.4.2 markdown-it-py-3.0.0 mdurl-0.1.2 multidict-6.0.5 numpy-2.1.0 oauthlib-3.2.2 pandas-2.2.2 parsy-2.1 pins-0.8.6 proto-plus-1.24.0 protobuf-5.27.3 pyarrow-17.0.0 pyarrow-hotfix-0.6 pyasn1-0.6.0 pyasn1-modules-0.4.0 pygments-2.18.0 python-dateutil-2.9.0.post0 pytz-2024.1 pyyaml-6.0.2 requests-2.32.3 requests-oauthlib-2.0.0 rich-13.7.1 rsa-4.9 scikit-learn-1.5.1 scipy-1.14.0 six-1.16.0 sqlglot-25.9.0 threadpoolctl-3.5.0 toolz-0.12.1 typing-extensions-4.12.2 tzdata-2024.1 urllib3-2.2.2 xxhash-3.5.0 yarl-1.9.4 zipp-3.20.0

(The most relevant ones should be duckdb-1.0.0, ibis-framework-9.3.0, ibis-ml-0.1.2, numpy-2.1.0, pandas-2.2.2, pins-0.8.6, pyarrow-17.0.0, pyarrow-hotfix-0.6, scikit-learn-1.5.1).

That said, I did notice that for https://ibis-project.github.io/ibis-ml/#create-your-first-recipe, could be more explicit in the requirements; I had to go install a number of these requirements afterward for that one, since it just says pip install ibis-ml.