jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.23k stars 330 forks source link

Developing a style guide #100

Open TheCleric opened 4 years ago

TheCleric commented 4 years ago

So in a recent PR I tried to bring in some formatting changes (not necessarily on purpose, mostly because it was my first PR, and I always have some sort of auto pep 8 formatter on, 😆).

This led to quite a few unrelated code changes and some understandable concern on @schorrm 's part (expecially for some of the choices that were made by the formatter.

However, I think (and I believe @schorrm agrees to some extent) that adding some code style standards could be fruitful, and if we can all coalesce around a shared tool and config to keep it painless, the better! The goal of the style guide would to make the code more readable and internally consistent.

So I'd like to use this issue to discuss what some participants like in a style guide, don't like in a style guide, or are apathetic to.

I'll begin with a few of mine.

More readable in my opinion

cols = [ col.replace('*', '').replace('#', '') for col in cols ]

For extra long lines I'd even break it this way as well

cols = [ col.replace('*', '').replace('#', '').extraLongFunctionGoesHereToTakeUpRoom() for col in cols ]


- Along those same lines I prefer to break apart long strings like this:
```python
my_string = "Pretend this string goes on for something like 120 characters... " +
    "The rest of the string goes here."

https://docs.python.org/3/howto/logging.html#logging-basic-tutorial

TheCleric commented 4 years ago

To be clear, these are my personal preferences. Not trying to dictate. 😄

TheCleric commented 4 years ago

Also for reference, here is Google's style guide:

https://google.github.io/styleguide/pyguide.html

schorrm commented 4 years ago

When there are things like long column specifiers, having ~125 char lines sometimes will be clearer.

schorrm commented 4 years ago
# Technically legal
cols = [col.replace('*', '').replace('#', '') for col in cols]

# More readable in my opinion
cols = [
    col.replace('*', '').replace('#', '') for col in cols
]

I see zero readability benefit from this.

I prefer spacing around my type declarations, variable assignments, and operators:

def team_pitching(start_season: int = None):
    for season in range(start_season, start_season + 1):
        pass

Agreed to this one.

For longer function definitions I prefer splitting them apart per parameter:

def team_pitching(
    start_season: int,
    end_season: int = None,
    league: str = 'all',
    ind: int = 1,
):

I do not. I prefer

def team_pitching(start_season: int, end_season: int = None, league: str = 'all', ind: int = 1):
schorrm commented 4 years ago

Re the other guidelines:

TheCleric commented 4 years ago
# More readable in my opinion
cols = [
    col.replace('*', '').replace('#', '') for col in cols
]

I see zero readability benefit from this.

For longer function definitions I prefer splitting them apart per parameter:

def team_pitching(
    start_season: int,
    end_season: int = None,
    league: str = 'all',
    ind: int = 1,
):

I do not. I prefer

def team_pitching(start_season: int, end_season: int = None, league: str = 'all', ind: int = 1):

By way of explanation of my reasoning, here is why I prefer each of those. To me it's about cognitive load. If I'm reading through and understanding code, I find it's easier when longer pieces of code are pre-broken into manageable chunks. I think of it as extra lines are free, brain cells are expensive. So if I can trade one for the other I will. This is just my preference though. It seems you prefer cohesive chucks of code, and I'm sure there's an argument that joining those separated lines takes brain cells as well.

We can see what others think as well. My preference (especially as a newcomer to the project) isn't especially important. 😄

schorrm commented 3 years ago

This line is 103 chars. I think it is a good example of being clean as-is

    mlb_only_cols = ['key_retro', 'key_bbref', 'key_fangraphs', 'mlb_played_first', 'mlb_played_last']
schorrm commented 3 years ago

If we do have one -- how would we use the file thingy? yapf sounds like the normal answer, yet seems perennially broken on VSCode

TheCleric commented 3 years ago

If we do have one -- how would we use the file thingy? yapf sounds like the normal answer, yet seems perennially broken on VSCode

We could add a .vsocde/settings.json that specifies using yapf. We could also add a config file for yapf to the repo that would allow us to set the preferred style:

YAPF will search for the formatting style in the following manner:

Specified on the command line
In the [style] section of a .style.yapf file in either the current directory or one of its parent directories.
In the [yapf] section of a setup.cfg file in either the current directory or one of its parent directories.
In the [style] section of a ~/.config/yapf/style file in your home directory.

More info here: https://github.com/google/yapf#formatting-style

TheCleric commented 3 years ago

This line is 103 chars. I think it is a good example of being clean as-is

    mlb_only_cols = ['key_retro', 'key_bbref', 'key_fangraphs', 'mlb_played_first', 'mlb_played_last']

Yeah, I'd agree. I wouldn't automatically break this up (though to be 100% honest, I probably wouldn't complain if someone did either).

schorrm commented 3 years ago

I mean my VSCode yapf is always broken

TheCleric commented 3 years ago

Strange. Does yapf from the command line work?

If so, then in your "Output" console on the bottom change the dropdown to Python, and see what it may be showing.

If we can get that working, I can create a yapf branch that we can tweak the style rules to get them where we want them.

reddigari commented 3 years ago

Just an idea: black --- you get zero customization, but it's stupid simple and its style is inoffensive to basically everyone. Integrates with all editors, and can be run on all PRs as a GitHub action.

schorrm commented 3 years ago

Maybe blue if they do a CI - we've been single quotes until this point.

TheCleric commented 3 years ago

Just an idea: black --- you get zero customization, but it's stupid simple and its style is inoffensive to basically everyone. Integrates with all editors, and can be run on all PRs as a GitHub action.

The only thing about the "opinionated" formatters, is I hate most of their opinions. 😆