pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.33k stars 1.96k forks source link

PlSmallStr conversions breaking after updating from 0.41 #19643

Open Lingepumpe opened 1 week ago

Lingepumpe commented 1 week ago

Checks

Reproducible example

// requires explicit conversion to PlSmallStr
let series = Series::new("timestamp", &timedeltas); //worked with 0.41
let series = Series::new("timestamp".into(), &timedeltas); //works with 0.44.2

// Works with &str
df.drop("index").unwrap(); // works with both 0.41 + 0.44.2
df.column("index").unwrap(); // works with both 0.41 + 0.44.2

// Needs conversion of &[&str] to &[String]
df.drop_nulls(Some(&["timestamp", "index"])); // worked with 0.41
df.drop_nulls(Some(&["timestamp".to_string(), "index".to_string()])); // works with 0.44.2

// Does not need conversion of &[&str] to &[String]
df.join(&df, ["timestamp"], ["timestamp"], JoinArgs::new(JoinType::Full)); // works with 0.41 + 0.44.2

Log output

No response

Issue description

Updating from polars 0.41 resulted in compiler errors around conversion of &str into PlSmallStr. I checked the release notes, but I found no information about breaking changes, how to best migrate, or similar. After fixing the errors, the new state seems inconsistent, see the example code:

Expected behavior

The API should be consistent in its use of &str, String, PlSmallStr.

Installed versions

name = "polars" version = "0.43.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0e248cf2f0069277f8fe80d413cfb9240c7dd1cfa382b5674c1b4afa57222747" features = ["dtype-duration", "lazy", "polars-ops"]
Lingepumpe commented 1 week ago

Confirmed to also be an issue with 0.44.2. Interestingly, 0.44.x was pulling in the pyo3=0.21 dependency unless I explicitly added default-features = false: polars = { version=">=0.44.2", features = ["dtype-duration", "lazy", "polars-ops"], default-features = false}. Not sure why this dependency is enabled, as I did not intend to activate any of the python features within polars. This was blocking my update to 0.44.x, as I have a dependency to pyo3 0.22 in my project.

Also, 0.44 adds a new abstraction layer between DataFrame and Series, called Column. I did not find a elegant way of accessing the series by column name directly (e.g. to call something like .mean() on it). Did I miss something in the docs? I ended up implementing a small helper function to do the job:

fn df_series<'a>(df: &'a DataFrame, name: &str) -> Option<&'a Series> {
    if let Ok(col) = df.column(name) {
        col.as_series()
    } else {
        None
    }
}

In https://docs.rs/polars/latest/polars/frame/struct.DataFrame.html most mentions of Series should be rephrased with Column now.

ritchie46 commented 1 week ago

Yes, this was breaking. It was also a new breaking release. I agree it should've been in the breaking changes list.