ibis-project / ibis-ml

IbisML is a library for building scalable ML pipelines using Ibis.
https://ibis-project.github.io/ibis-ml/
Apache License 2.0
86 stars 13 forks source link

docs: build demo workflows #49

Closed jitingxu1 closed 3 months ago

jitingxu1 commented 6 months ago

~We are currently targeting the NVTabular demo on RecSys2020 Challenge as a demo workflow.~ Update: Due to the RecSys2020 demo data being unavailable (and against Twitter's terms to share), we will start with the R nycflights13 dataset. It has been added to Ibis examples to support this.

Major tasks

  1. Demo Dataset
    • nycflights13
  2. Feature engineering
    • Use ibis and ibisml for data preprocessing
  3. Model training
    • XGBoost
    • sklearn
    • PyTorch
deepyaman commented 5 months ago

Initial version merged in #60. Some remaining TODOs:

deepyaman commented 3 months ago

Initial version merged in #60. Some remaining TODOs:

  • [ ] Make sure unique key is actually unique
  • [ ] Do we want to do something with weather data? We join it, and then throw away all the columns...
  • [x] Include TargetEncoding step
  • [x] Clearly document handoff to other modeling frameworks (did test XGBClassifier works, but add it to a notebook; also, make sure PyTorch demo is there, maybe something like skorch with MLP)... (maybe add model choice to demo notebook)
  • [x] Once Ibis 9.0 is released, update demo notebook to pull directly from ibis.examples.nycflights13_*.fetch() instead of local DuckDB

The remaining TODOs don't seem like a pressing priority; will close this as completed.