pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.36k stars 169 forks source link

[ENH] adorn functions #993

Open thatlittleboy opened 2 years ago

thatlittleboy commented 2 years ago

Brief Description

There are a few adorn_* functions from R's janitor that are not yet ported over to pyjanitor. Janitor docs here.

I'm specifically looking at:

I imagine these might be particularly useful for those doing data reporting. These should go into the functions module.

Example API

In pyjanitor, I don't think having four separate functions work (how to enforce that adorn_ns comes after adorn_percentages? and where would we get the counts required for adorn_ns? etc.).

Perhaps we could just do a adorn_totals, and an adorn_percentages (which encapsulates the behaviour of adorn_pct_formatting and adorn_ns as well, controlled via function parameters).

adorn_totals

This function should mirror the R function almost 1-1.

>>> df = pd.DataFrame({"a": [6, np.nan, 2.5], "b": list("xyz")}); df
     a  b
0  6.0  x
1  NaN  y
2  2.5  z
>>> df.adorn_totals(
...     subset=None,  # or list of index/col names; preferably can take in ranges like `slice("col_a","col_d")` also since `.loc` supports it
...     axis="col",  # index/0/row or column/1/col or both
...     fill_value: str='-',
...     name: str='Total',
... )
         a  b
0      6.0  x
1      NaN  y
2      2.5  z
Total  3.5  -

A few points I disagree(?) with the R implementation:

adorn_percentages

TBD. Let me have a little think about this over the weekend, I decided against my own implementation idea while writing out the example API.. ><

Original idea ```python >>> df = pd.DataFrame({"a": [6, np.nan, 2.5], "b": list("xyz")}); df a b 0 6.0 x 1 NaN y 2 2.5 z >>> df.adorn_percentages( ... subset=None, # similar to `adorn_totals` ... axis='col', # similar to `adorn_totals` ... adorn_count=True, ... count_position='front', # ignored if adorn_count=False ... count_format=0, # ignored if adorn_count=False ... percentage_format=2, ... ) a b 0 6 (70.59%) x 1 nan y 2 3 (29.4%) z ``` Parameters: - `count_position`: whether to do front=="56 (23.4%)", back=="23.4% (56)" - `count_format` / `percentage_format`: if int, then represents the number of decimal places to round to. otherwise a string format specification like ':,.2f' or whatever. I'm not that sold on this API yet. Doesn't look too clean / friendly to use. After all, it is an amalgamation of 3 different behaviours in 1 function 😅). Would be happy to hear comments / suggestions to improve, if any.
ericmjl commented 2 years ago

@thatlittleboy your thoughts on encapsulation to enforce order sound like the right thing to do.

I'd admit I'm not so well-versed in the adorn_* family of functions in janitor, so I'll hold off on commenting on their specific behaviour. That said, I am in favour of adding in janitor functionality into pyjanitor, and I'm also in favour of your way of thinking about how to organize the functions in a sane fashion too. :smile:

thatlittleboy commented 2 years ago

Great, thanks for the affirmation @ericmjl . I'll have a think about the desired API and propose something in a PR when I'm ready. :)

Sabrina-Hassaim commented 2 weeks ago

Hello, My name is Sabrina, and I’m excited about the opportunity to contribute to the pyjanitor project. I have been exploring it and found several issues that align with my skills. I would love to be assigned to one or more issues, starting by this one. Please let me know how I can help.

Thank you

ericmjl commented 2 weeks ago

Hi @Sabrina-Hassaim, welcome! I am going to tag @samukweku, he’s been super active here as a core contributor to pyjanitor and has more context than I. Meanwhile, can I ask, what are your goals for contributing? Want to see how we can best support you as you make your contributions!

Sabrina-Hassaim commented 2 weeks ago

Hello @ericmjl, thank you for your response. I’m currently working on an academic project where I need to contribute to open-source projects by resolving issues. Given my background in data analysis and my experience with libraries like Pandas, I found it fitting to contribute to PyJanitor, as it aligns with my skill set.

samukweku commented 1 week ago

hi @Sabrina-Hassaim please feel free to contribute; i suggest you have a look at the development guide. looking forward to your PR.