pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 170 forks source link

Proposed alternative for `join_apply` in deprecation notice does not replicate its behavior #1399

Closed lbeltrame closed 2 months ago

lbeltrame commented 2 months ago

Brief Description

Currently, the docs for join_apply state its deprecation, and advise to use transform_column instead. However, join_apply works row-wise, and transform_column is either single-column or multiple columns but with the same function applied to each column individually.

At this point, it is unclear what would be the correct replacement for join_apply. A reference to an alternative approach, or a snippet would make the documentation clearer.

For the record, I filed this under "Documentation fix" as it's not a code problem, but a documentation problem.

A possible alternative may be to suggest the actual code join_apply had:

df = df.copy().join(df.apply(fn, axis=1).rename(new_column_name))

Relevant Context

samukweku commented 2 months ago

@lbeltrame thanks for the feedback. Do you mind sharing a reproducible snippet where transform_column does not match the expected output of join_apply? this will allow us to work on concrete data and we can figure out the proper way to document this

lbeltrame commented 2 months ago

Correct me if I'm wrong, but the function passed to transform_column gets as input either the single values (elementwise=True) or the whole column as a Series (elementwise=False). join_apply gets the entire row instead passed to the function, meaning that the function has access to a Series whose contents are the values for each column in the row. transform_column can't replicate this behavior, because at best, it is repeated independently on all target columns when used as transform_columns.

tl;dr: join_apply is a row-wise operation. transform_column works on a single column, and transform_columns is a column-wise operation. You can't replicate join_apply with transform_column because they're fundamentally different operations.

samukweku commented 2 months ago

@lbeltrame I get what you mean now. yeah there is no alternative at the moment for join_apply. Admittedly, there are more performant ways to execute the examples in join_apply. The deprecation was a wrong call from me. The intent was to replace transform_columns and join_apply with mutate which would be more flexible and be applicable row wise or column wise, but that is not yet ready ( I havent had the time to revisit it and create a PR). If you are up to it, can you create a PR that reverses the deprecation warning on join_apply?

lbeltrame commented 2 months ago

Indeed. Done.