librasteve / raku-Dan-Polars

Raku Polars binding
Artistic License 2.0
5 stars 2 forks source link

splice, concat, join, hstack and vstack #10

Open librasteve opened 1 year ago

librasteve commented 1 year ago

Dan::Polars

Dan

this issue to weigh the benefit of implementing as splice & concat for Dan::Polars

and thus to mask Dan splice & concat

Series

these are both "in place"

DataFrame

Dan has ...

dfa.concat: dfc, join => 'inner';
#`[
      letter  number
 0    a       1
 1    b       2
 0⋅1  c       3
 1⋅1  d       4
#]

Polars has...

Options

  1. take the Polars approach, deprecate splice & concat, replace with append, hstack, vstack, join
  2. take the Dan approach, implement splice & concat as wrapper

Conclusion

In the light of implementation, it appears the best common solution is to take a 3rd path that replaces this method zoo with .concat and .join. This is detailed below....

librasteve commented 11 months ago

These tables compare the API methods:

Table 1: Combining functions for DataFrames (Pandas and Polars)

Function Description Pandas Polars Dan
vstack Stack vertically pd.concat([df1, df2], axis=0) pl.vstack([df1, df2]) or pl.concat([df1, df2]) df1.concat(df2)
hstack Stack horizontally pd.concat([df1, df2], axis=1) pl.hstack([df1, df2]) or pl.concat([df1, df2], how="horizontal") df1.concat(df2, :axis(1))
concat Concatenate along an axis pd.concat([df1, df2], axis=0/1) pl.concat([df1, df2], axis=0/1) df1.concat(df2, axis=>0/1)
join Join on a column df1.join(df2, on="col") or pd.merge(df1, df2, how="inner", on="col") df1.join(df2, how="inner", on="col") df1.join(df2, how=>'inner', on=>'col')

Table 2: Combining functions for Series (Pandas and Polars)

Function Description Pandas Polars Dan
concat Append one Series to another n/a n/a series1.concat( series2 )
append Append one Series to another series1.append(series2) n/a series1.concat( series2 )
join Join Series on index series1.join(series2, how='inner') pl.join([series1, series2], on='index_column') n/a

Sources:

librasteve commented 10 months ago

The solution is:

Table 1: Combining functions for DataFrames

Function Description Dan
concat Concatenate along an axis df1.concat(df2, axis=>0/1)
join Join on a column df1.join(df2, how=>'inner', on=>'col')

Table 2: Combining functions for Series

Function Description Dan
concat Append one Series to another series1.concat( series2 )
librasteve commented 10 months ago

Notes:

  1. use concat in place of hstack, vstack a. concat – diagonal is not provided b. concat - multiple is not provided
  2. merge (Python) becomes join a. join - right is not provided (you need to swap arguments) b. join - [semi, asof] are not (yet) provided
  3. Dan and Dan::Pandas concat to be refactored out to concat and join
  4. Dan splice to be replaced with some combination of 'concat', ‘with_columns’ and ‘drop’
  5. Dan set ignore-index as default 1 (?)