koaning / calmcode-feedback

A repo to collect issues with calmcode.io
12 stars 0 forks source link

Polars course - Code samples from chapter 7 "Over Expressions" incomplete and will not run #149

Closed matthias-busch closed 2 years ago

matthias-busch commented 2 years ago

Hi guys,

lovely small little course about Polars. I really enjoyed it and it helped me to get a nice overview about this new library. So thank you for your work! <3

There is however a missing code line in the code example under the video from chapter 7 of the course: https://calmcode.io/polars/over-expresssions.html right in the sessionize function.

Basically the line where the session column gets calculated is missing.

What it says:

def sessionize(dataf, threshold=20 * 60 * 1000):
    return (dataf
             .sort(["char", "timestamp"])
             .with_columns([
                 (pl.col("timestamp").diff().cast(pl.Int64) > threshold).fill_null(True).alias("ts_diff"),
                 (pl.col("char").diff() != 0).fill_null(True).alias("char_diff"),
             ])
             .with_columns([
                 (pl.col("ts_diff") | pl.col("char_diff")).alias("new_session_mark")
             ])

             .drop(["char_diff", "ts_diff", "new_session_mark"]))

What it should say to run and function correctly:

def sessionize(dataf, threshold=1_000_000):
    return (dataf
             .sort(["char", "timestamp"])
             .with_columns([
                 (pl.col("timestamp").diff().cast(pl.Int64) > threshold).fill_null(True).alias("ts_diff"),
                 (pl.col("char").diff() != 0).fill_null(True).alias("char_diff"),
             ])
             .with_columns([
                 (pl.col("ts_diff") | pl.col("char_diff")).alias("new_session_mark")
             ])
             .with_columns([
                 pl.col("new_session_mark").cumsum().alias("session")
             ])
             .drop(['char_diff', 'ts_diff', 'new_session_mark']))
koaning commented 2 years ago

Well spotted! Just made a quick PR. Should be deployed within 5 mins.