machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.14k stars 48 forks source link

What’s the preferred syntax for excluding columns when using `select`? #463

Open briandk opened 1 year ago

briandk commented 1 year ago

In the guide on selecting columns, you use the tilde (~) operator before a column name to indicate that column should be excluded from the selection:

You can remove a column from the data by putting a tilde operator (~) in front of it.

penguins >> select(~_.body_mass_g, ~_.sex, ~_.year)

But, later on down that page in the pandas comparison, you use a dash (or minus sign?) before the column name:

# keep all *except* cyl column
mtcars >> select(-_.cyl)

Is one notation preferred?

machow commented 1 year ago

Hey, I would lean towards ~. I added it to match dplyr's switch from - to !, but old habits die hard 😬.

Let's use this issue to track changing the docs to consistently use ~.