python-sprints / pandas-mentoring

Mentoring new pandas contributors.
BSD 3-Clause "New" or "Revised" License
6 stars 30 forks source link

DISCUSSION: Propose topics for pandas tutorials #140

Open datapythonista opened 5 years ago

datapythonista commented 5 years ago

In the pandas documentation, we would like to add tutorials that cover end to end real use cases of pandas. This should make things very easy for first time users trying to address a specific problem with pandas.

Based on my personal experience, those are the kind of problems I usually address:

I'm sure people is doing other cool things with pandas, would be great to brainstorm and find more use cases, that are worth having a tutorial.

galuhsahid commented 5 years ago

I'm thinking something along data cleansing - we can start with a real-world example of a messy dataset (with duplicated rows, missing values, unnecessary columns/rows...) and end up with a tidy one. I think this could be useful for people who are using pandas to clean their dataset, especially when the data gets too large for software to handle that it ends up slowing down their process.

However I can imagine that there are many ways to define what a messy dataset is, and since we're looking to address a specific problem, we might end up trying to solve too many problems at once.

I did run a workshop on this topic (notebook here, though it's in Indonesian) and we covered duplicated rows, missing values, removing columns/rows, and renaming column names on one real-world dataset.

Would love to hear all your thoughts on whether this use case is worth having a tutorial or not. Looking forward to discussing all other use cases as well.

WuraolaOyewusi commented 5 years ago

@datapythonista In text Preprocessing, pandas plays a big role in giving some structure to the data. It's blissful to simply apply functions along columns.

@galuhsahid I think it's a good idea to use a real world dataset, and the use case is worth it from my perspective.

sara-02 commented 5 years ago

I agree that an end-end tutorial is always better. Some examples of end-end tutorials that I have presented to college students: 1: https://github.com/sara-02/pradarshan/blob/master/FWD_17_intro_to_pandas.ipynb 2: https://github.com/sara-02/pradarshan/blob/master/pandas_basic/py6.ipynb

Also, as mentioned by @WuraolaOyewusi showing pandas usecase on text Preprocessing will be another good usecase. Most tutorials we see for Pandas cover numerical analysis, text analysis tutorial will be a plus.