wdoppenberg / polars-candle

A text embedding extension for the Polars Dataframe library.
22 stars 0 forks source link

polars-candle

A polars extension for running candle ML models on polars DataFrames.

Example

Pull any applicable model from Huggingface, such as the recently released Snowflake model, and embed text using a simple API.

import polars as pl
import polars_candle  # ignore: F401

df = pl.DataFrame({"s": ["This is a sentence", "This is another sentence"]})

embed_kwargs = {
    "model_repo": "Snowflake/snowflake-arctic-embed-xs",
    "pooling": "mean", 
}

df = df.with_columns(
    pl.col("s").candle.embed_text(**embed_kwargs).alias("s_embedding")
)
print(df)
# ┌──────────────────────────┬───────────────────────────────────┐
# │ s                        ┆ s_embedding                       │
# │ ---                      ┆ ---                               │
# │ str                      ┆ array[f32, 384]                   │
# ╞══════════════════════════╪═══════════════════════════════════╡
# │ This is a sentence       ┆ [-0.056457, 0.559411, … -0.20403… │
# │ This is another sentence ┆ [-0.117206, 0.336827, … 0.174078… │
# └──────────────────────────┴───────────────────────────────────┘

Currently, Bert, JinaBert, and Distilbert models are supported. More models will be added in the future. Check my other repository wdoppenberg/glowrs to learn more about the underlying implementation for sentence embedding.

Installation

Make sure you have polars installed. If not, install it using pip install polars. Then, install polars-candle using

pip install polars-candle

Note: The macOS ARM wheels of this library come with Metal support out of the box. For CUDA, check the below instructions on how to build from source.

If you want to install the latest version from the repository, you can use:

pip install git+https://github.com/wdoppenberg/polars-candle.git

Note: You need to have the Rust toolchain installed on your system to compile the library. See here for instructions on how to install Rust.

You can set build features using maturin:

maturin develop --release -F <feature>

Where <feature> can be one of the following:

Roadmap

Credits

Note

This is a work in progress and the API might change in the future. Feel free to open an issue if you have any suggestions or improvements.