Open TomAugspurger opened 5 years ago
There's probably a lot of overlap between this and https://github.com/pandas-dev/pandas/issues/26831.
What about:
If you are searching stackoverflow, still lots of questions do chained indexing.
Additionally, in lots of questions people want to iterate, which most of the times can be avoided using vectorisation, boolean masks etc. I would put this under this under tidy data, since people often just come up with awfully formated data, we could emphasize how easy tasks are if data are well formatted. (Think of lists of strings or tuples in a column)
+1 for avoid iterations, boolean masks. From interviews, I can confirm a majority of newbies are bad at both.
On a broader point, I think "how you should write pandas code" falls into two buckets:
I think both are valuable, and a good best-practices document would be helpful for the community at large. Syntactic sugar can be addressed by an opinionated doc with short examples like the airport ones in @TomAugspurger 's notebook; efficiency is best addressed (IMO) with plots showing the runtime/mem footprint of different methods (see https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas for a good example on iterrows).
I am not sure how to give this document the necessary visibility to make it useful, although that is a problem to be solved after there is a defined document that the community thinks is great.
Probably not happening for 1.0.
I'd like to have a document that describes how we think people should write pandas code.
This introduces a bit of friction when documenting something, since you'll need to decide "does it go in best practices or the user guide?" But I think the idea of a "best practices" document with opinionated, short examples and prose, linking back to the user guide and API docs, is valuable.
I've started a notebook at https://mybinder.org/v2/gh/TomAugspurger/pandas-best-practices/master?filepath=Best%20Practices.ipynb
Are there any sections you would add / remove?
Would you structure it differently?
(tangentially, I'd like to explore how we can incorporate binder into our documentation).