tidypyverse / tidypandas

A grammar of data manipulation for pandas inspired by tidyverse
https://tidypyverse.github.io/tidypandas/
MIT License
91 stars 7 forks source link

[feature] Implement `tidyselect` #27

Open talegari opened 2 years ago

talegari commented 2 years ago

Column names should also support functions along with strings: df.select(['a', 'b', 'c']) # regular df.select(['a', 'b', starts_with('c'), ends_with('d'), contains('some_regex')])

tidyselect should power all the methods that take column_names as the input.

grahitr commented 1 year ago

Also, replace_na, should be changed to accept list of columns and/or column selector in the key of the dictionary being passed.

Also, is the value appropriate argument name in replace_na?

talegari commented 1 year ago

IMHO, value should be a single value for simplicity. If we allow a list of values to be passed, then we implicitly already the columns to be renamed right?

my suggestion: df.replace_na({ends_with("width"): 0}) and not df.replace_na({ends_with("width"): [0, 1]}) as in latter case, usually we might not know how many columns get selected.

grahitr commented 1 year ago

To expand tidyselect for following methods

count add_count nest_by expand complete Summarise -> in by Mutate -> in by

sardnar commented 10 months ago

Is there is an easy way to filter using tidyselect. Like:

df.filter(starts_with("x") < 10)