jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.24k stars 331 forks source link

Player Position for batting_stats and pitching_stats #171

Closed baileymorton989 closed 3 years ago

baileymorton989 commented 3 years ago

When using the methods batting_stats() and pitching_stats(), would it be possible to also extract the player positions as well? Such as C, 1B, 2B, SS, 3B, or OF for batting_stats() and SP or RP for pitching_stats? Is there another method that scrapes the FanGraphs site and grabs the player position?

schorrm commented 3 years ago

ו don't think it's extractable from the table (don't think it's a column there). For pitching, you could probably define it based on G : GS? For hitters, fielding or appearances data?

baileymorton989 commented 3 years ago

It didn't look like it was extractable from the table, unfortunately. That sounds good! Are there any methods that currently extract this information?

tjburch commented 3 years ago

lahman.fielding has column POS. You could merge this onto the batting_stats output, is that sufficient? Or do you need something more?

baileymorton989 commented 3 years ago

lahman.fielding has column POS. You could merge this onto the batting_stats output, is that sufficient? Or do you need something more?

This is a great help, thank you @tjburch ! I think I would still need to reverse engineer the playerids from the lahman database so I can properly merge by player name with batting_stats output. Was there something else you happened to have in mind or is that the best approach?

Is there a more specific way to separate pitchers into RP or SP? I can see what @schorrm was mentioning by defining a threshold based on the G : GS ratio.

Let me know!

tjburch commented 3 years ago

I think I would still need to reverse engineer the playerids from the lahman database so I can properly merge by player name with batting_stats output.

~Yeah, you have to do this, but the function playerid_mapping() should have all the IDs there for you.~ I'm a liar see below

Is there a more specific way to separate pitchers into RP or SP? I can see what @schorrm was mentioning by defining a threshold based on the G : GS ratio.

This probably would have been easier before openers :) I don't know if there's a 100% definitive way to do it, but my ad-hoc method is similar to what @schorrm suggested, I usually add a second criterion on the number of pitches per game.

p = pitching_stats(2018, qual=False)
p["isStarter"] = ((p["GS"] / p["G"]) > 0.8)  &  ((p["Pitches"] / p["G"])  > 35)

I think this should work on this day in 2020. If openers became a predominant thing in a few years or something, you could raise both thresholds a bit and do a logical OR too.

baileymorton989 commented 3 years ago

Yeah, you have to do this, but the function playerid_mapping() should have all the IDs there for you.

Sounds great, thank you! Can I import this directly from pybaseball or is it contained in another module?

This probably would have been easier before openers :) I don't know if there's a 100% definitive way to do it, but my ad-hoc method is similar to what @schorrm suggested, I usually add a second criterion on the number of pitches per game.

Agreed! I'm not too concerned with that at the moment, but I think your method will work for my analysis!

Thank you for the help and insight.

tjburch commented 3 years ago

Sounds great, thank you! Can I import this directly from pybaseball or is it contained in another module?

Ah shoot that was a personal thing I had lying around my dev repo... This should work though:

>>> from pybaseball import chadwick_register
>>> a = chadwick_register()

The lahman key is the same as the bbref it looks like.

baileymorton989 commented 3 years ago

from pybaseball.playerid_lookup import chadwick_register a = chadwick_register()

Thank you! I got the following error when trying to import chadwick_register:

ImportError: cannot import name 'chadwick_register' from 'pybaseball.playerid_lookup' (C:\ProgramData\Anaconda3\lib\site-packages\pybaseball\playerid_lookup.py)

The lahman key is the same as the bbref it looks like.

Sounds good! Isn't batting_stats pulling from Fangraphs? Or am I not interpreting the keys correctly(apologies if so)?

tjburch commented 3 years ago

You don't need the relative import, import from the module directly: from pybaseball import chadwick_register. Docs here

Sounds good! Isn't batting_stats pulling from Fangraphs? Or am I not interpreting the keys correctly(apologies if so)?

Yep - docs here

baileymorton989 commented 3 years ago

You don't need the relative import, import from the module directly: from pybaseball import chadwick_register

Apologies for that typo-thanks!

tjburch commented 3 years ago

Probably good to close this one @schorrm, unless any of this we want to think about implementing (infer SP on pitching_stats?)

baileymorton989 commented 3 years ago

from pybaseball import chadwick_register

Still having an issue with this import for some reason. Guess I could trying updating pybaseball?

tjburch commented 3 years ago

Probably. On my system:

|11:28:25|tburch@crunchwrap:[~] python3 -m venv test
|11:28:39|tburch@crunchwrap:[~] source test/bin/activate
(test) |11:28:54|tburch@crunchwrap:[~] pip install pybaseball
(test) |11:34:07|tburch@crunchwrap:[~] python3 -c "import pybaseball; pybaseball.chadwick_register(); print('Success')"
Gathering player lookup table. This may take a moment.
Success
(test) |11:34:04|tburch@crunchwrap:[~] pip freeze
...
pybaseball==2.1.1
...
baileymorton989 commented 3 years ago

Probably. On my system:

Updating to 2.1.1 helped, thank you!