kruser / pitchfx-site

A webapp for searching PitchFX data
Apache License 2.0
21 stars 7 forks source link

Clarification of pitch events and pitch results terminology #39

Open albertlyu opened 10 years ago

albertlyu commented 10 years ago

This issue is more of a discussion thread rather than an actionable issue, at least for now.

Swinging strikes vs. whiffs

This is more of the former FanGraphs writer side of me speaking, but sometimes there is confusion in the baseball stats community about what 'whiff rate' means (whiffs/pitch or whiffs/swing). Typically, a 'swinging strike' is synonymous with 'whiff,' but 'swinging strike rate' (that is, whiffs/pitch) is not synonymous with 'whiff rate' (or whiffs/swing).

Whiffs/pitch and whiffs/swing may imply different behavior. A batter who swings a lot will tend to have a higher whiffs/pitch than a batter who rarely swings. But Swing-A-Lot High-Contact Batter has a lower whiffs/swing than Rarely-Swinging Low-Contact Batter. In other words, whiffs/swing is a better indicator of contact ability, while whiffs/pitch is some combination of swing behavior and contact ability (whiffs/swing * swings/pitch = whiffs/pitch, or whiff rate * swing rate = swinging strike rate).

The pitches table is efficiently constructed via the aggregator JS object, such that all pitch event rates have the common denominator of total pitch count. I don't see any reason to mess with that. But perhaps we can add a clarification text box, or as a lower priority, a glossary of baseball event terminology.

Additional pitch event segmentation to report on plate discipline

Alternatively, I'd imagine we can also incorporate other plate discipline statistics in the future, such as out-of-zone swings/pitch, out-of-zone whiffs/swing, etc. If so, we can add pitch subset checkbox filters for the aggregated pitches table, for example, All, Swing or No Swing, Inside Zone or Outside Zone, etc.

Batted ball results

I'm of the school of thought that batted ball type purport to be categorical, but is actually ordinal. Line drives / fly balls / pop ups can be inconsistently classified, leading to terminology such as 'fliners'. Without HITf/x data available to the public, which reports on batted ball data such as speed off bat, elevation angle, and field direction (typing that makes me salivate), we cannot accurately report on such batted ball classifications. If we want a 'batted ball results' view or add additional pitch subset filter for batted balls, I suggest grouping LD/FB/PU together as 'air balls,' to distinguish from 'ground balls.'

Reference: http://www.fangraphs.com/library/pitching/plate-discipline-o-swing-z-swing-etc/

albertlyu commented 10 years ago

Perhaps thoughts from this thread can be merged into #16 and/or #18 -- this issue could be under milestone 'Scout Assist'.

kruser commented 10 years ago

@albertlyu , you're reading my mind! I was just on that fangraphs page last night. I do want to reorg the columns on the pitches tab a bit to come nearer to industry standards. All of this probably doesn't fit into single table so we'll have to org into multiple sets.

For anybody reading this issue, I think the next step is to present a suggestion via a simple table in this issue and once we come to consensus we can assign someone to implement.

As for the LD/PU/FB/GB categorization, I think the categorization, while not perfect, is pretty decent. I know we're relying on one person at the game to classify the hit as such but I don't have a desire to dumb them down further. It can be an exercise to the reader to bucket the FB/PU/LD into airballs. If anybody feels really strongly about this subject I'm still open to discussing. Or, if you have an idea on how to make both available intuitively through the UI we can discuss that too.

albertlyu commented 10 years ago

Experimenting a bit here. By modifying https://github.com/kruser/pitchfx-site/blob/master/app/app/views/partials/pitches.html#L52-L64, I get the following (note -- labels modified for clarification):

image

And this is my html:

<td class="text-right" tooltip="{{row.swing}}" tooltip-append-to-body="true">{{row.swing/row.count*100|number:1}}%</td>
<td class="text-right" tooltip="{{row.swingfoul}}" tooltip-append-to-body="true">{{row.foul/row.swing*100|number:1}}%</td>
<td class="text-right" tooltip="{{row.swingbip}}" tooltip-append-to-body="true">{{row.bip/row.swing*100|number:1}}%</td>
<td class="text-right" tooltip="{{row.swingwhiff}}" tooltip-append-to-body="true">{{row.whiff/row.swing*100|number:1}}%</td>

The last three columns don't add up to 100%. Am I missing something here?? What's the exact definition of 'ball in play'? Does it include home runs? (https://twitter.com/BtBScore/status/442087461209251840)

kruser commented 10 years ago

I'll have to look at the source data a little bit more to determine why. This logic seems solid https://github.com/kruser/pitchfx-site/blob/master/app/app/scripts/pojos/Pitch.js#L16-L54

...so it might be the pitchStats.js controller.

albertlyu commented 10 years ago

In addition to HR, other examples of pitch results that could cloud the foul/bip/whiff classification include foul tips (pretty sure it's a foul) and dropped third strikes (pretty sure it's a whiff).

albertlyu commented 10 years ago

Here is my working idea of additional tabular views under the pitches view:

Summary (default view) Pitch , #, B, K, Swing, Whiff, Foul, BIP, Hit, BIP-Out, GB, LD, FB, PU

Pitch Results (denominator: row.count or row.swing) Pitch , #, Ball, CalledStr, SwStr, Foul, BIP-Out, Hit, Whiff/Swing, Contact/Swing

Batted Ball Results Pitch , GB/FB, GB%, LD%, FB%, PU%, HR/FB (using row.hr?)

Plate Discipline (would likely require some ETL...) Pitch , Swing%, Contact%, O-Swing%, Z-Swing%, O-Contact%, Z-Contact%, Zone%, F-Strike%

Is it possible to bring the base-out state filters under the tabular view? Also related, would it require significant rework to get tabular views of relative pitch type frequencies for certain base-out states and counts? For example, Kershaw throws a 4-seam fastball 32% of the time on all pitches, but 68% of the time on the first pitch, 25% when ahead in the count, 45% when behind in the count, etc. Something like row.count / row.total?

Pitch Selection: Base-Out State Pitch , #, %, 0-000 ... 2-103 ... (24 total base-out states) (Example: 2-103 indicates two outs with runners on 1st and 3rd)

Pitch Selection: Situational Pitch , #, %, First Pitch, Ahead in the Count, Behind in the Count, Three balls, Two strikes, Full Count

Let me know if I can clarify (the Github markdown for tables does not seem to be that great).

kruser commented 10 years ago

This is great @albertlyu, thanks for putting this together. I think these are all very necessary for a site like this, but they don't fit in well with the current UI. So, we should change the current UI :).

I'm going to start thinking about using the site as a researcher. This issue, along with #46 will give this site an excellent shift towards creating a filter, sticking the filtered data, and comparing it to another set of filtered data. I think it is all very core to the site flow so we can't take it lightly.

Stay tuned for more info.

albertlyu commented 10 years ago

@kruser, will the work addressed in #46 allow for base-out state and ball-strike count for each pitch to be persisted into the aggregator in pitchStats.js? Or will that be a separate issue to work on? See https://github.com/kruser/pitchfx-site/blob/master/app/app/scripts/controllers/pitchStats.js#L382-L404.

I'm imagining that in order for my idea laid out in https://github.com/kruser/pitchfx-site/issues/39#issuecomment-37259291 to work, we'd need aggregator in pitchStats.js to look something like this:

aggregator = {
    pitchCode: pitchCode,
    displayName: pitchfx.Pitch.getPitchDisplayName(pitch.getPitchType()),
    count: 0,
    ball: 0,
    strike: 0,
    swing: 0,
    whiff: 0,
    foul: 0,
    bip: 0,
    hit: 0,
    out: 0,
    grounder: 0,
    liner: 0,
    flyball: 0,
    popup: 0,
    // new counts here
    baseout0-000: 0, // improperly-named 24 base-out states
    baseout0-100: 0,
    ...
    ...
    ... 
    count0-0: 0, // also improperly-named 12 ball-strike counts
    count0-1: 0, 
    ...
    ... 
};

Finally, I'm also curious because of this article that came out on The Hardball Times today: http://www.hardballtimes.com/a-decision-tree-approach-to-pitch-prediction/

kruser commented 10 years ago

@albertlyu, work on the atbat-mongodb project will need to be completed before we can get the base/out state on each pitch. Right now I just have that information at the start of each atbat, which is mostly accurate, but falters when a baserunning out occurs.

Also related, I need to store the ball/strike on each pitch. I have that issue open on the atbat-mongodb project already. https://github.com/kruser/atbat-mongodb/issues/5

albertlyu commented 10 years ago

I'll think about this issue a little bit more. A potential thought here is to have some consistency between the pitch table and the heatmaps (including possibly adding the sabermetric outcomes as additional columns). If the metrics in the pitch table eventually match with the heatmaps, then the column names could get crowded, requiring another accordion of some sort, so that could be a downside. So the alternative is to stick with a summary table that shows the important results.

kruser commented 10 years ago

Let's aim to bring all metrics in this table inline with the heatmaps.

albertlyu commented 10 years ago

One idea so far is to add a lot of metrics and allow a horizontal overflow of the table to the right. We'd want to freeze Pitch type column though as you scroll to the right. Major problem with this method though is the responsiveness of the tables for smaller screen widths.

image

The other idea would be to add an accordion to the left, and having 3-4 pitch table views.

Edit: Unless I'm misunderstanding what you mean by bringing this table inline with the heatmaps? Do you mean placing the table and heatmaps in the same div.row?

kruser commented 10 years ago

@albertlyu, I never addressed your last comment.

I wasn't thinking of changing the layout, I meant that the metrics in the table should be measuring the same things we do in the heatmaps.

But, now that we have the heatmaps neatly organized, does it make sense to break the tables into that view with the accordion? I think we should have a discussion on this now. I'll keep thinking.

albertlyu commented 10 years ago

@kruser, if that's the case, we can keep B, K, and a metric for each heatmap in the same order, and I think the table will fit nicely into the view. What I was concerned about was if the table got too wide. But changing the metrics to be in line with what we're showing in the groups of heatmaps should be straightforward.

I think it'd be worth putting thought into how we'd want to support more tables though, not necessarily the same accordion groups as the heatmaps, but once major ETL upgrades have been made to atbat-mongodb, we can support many many more metrics in the pitches tables. The grouping of these metrics, I think, was what this issue thread ultimately was intended for as well.