statsbomb / statsbombpy

Easily stream StatsBomb data into Python
559 stars 76 forks source link

Visualization Issue with Statsbomb 360 Data – Possible Left-Right Inversion #66

Open leemingo opened 2 weeks ago

leemingo commented 2 weeks ago

Contact Details

mhlee7227@gmail.com

Version

1.0.1

What platform are you seeing the problem on?

Linux

What Python version are you running, are you using a virtual environment? Give us as much info as you can.

Python 3.10.14

What happened?

When visualizing Statsbomb 360 data in chronological order, certain data points appear to be recorded as if they are left-right mirrored compared to the expected image. Could you clarify how this left-right inversion might be represented in the data? If not, what is the correct method to visualize the data accurately?

statsbomb360

Relevant log output

The event data locations are recorded as [33.2, 63.8], [33.2, 63.8], [85.5, 15.4], [32.7, 65.3], with the same order of the attached image.

Code to reproduce issue

import matplotlib.patches as mpatches
from mplsoccer import Pitch
import numpy as np
import pandas as pd
from socceraction.data.statsbomb import StatsBombLoader
from statsbombpy import sb

competitions = sb.competitions()

MATCH_ID = 3895302 #1st round

events_df = sb.events(match_id=MATCH_ID)
frames_df = sb.frames(match_id=MATCH_ID) # 360 data

SBL = StatsBombLoader(getter="remote", creds={"user": None, "passwd": None})
selected_competitions = competitions[(competitions.competition_id==9) & (competitions.season_id==281)]
games = pd.concat([
    SBL.games(row.competition_id, row.season_id)
    for row in selected_competitions.itertuples()
])

home_team_id = games[games.game_id==MATCH_ID].home_team_id.values[0]
away_team_id = games[games.game_id==MATCH_ID].away_team_id.values[0]
home_team_name = events_df[events_df.team_id==home_team_id].team.unique()[0]
away_team_name = events_df[events_df.team_id==away_team_id].team.unique()[0]
color_list = ['b', 'r']

event_360_df = events_df[events_df.id.isin(frames_df.id)]
event_360_df = event_360_df.sort_values(["index"])
event_360_df = event_360_df.reset_index()
event_360_df = event_360_df.rename(columns={'level_0' : 'ori_index'})

pressure_idx = event_360_df[event_360_df.type=="Pressure"].index[0]
pressure_df = event_360_df.iloc[pressure_idx-2:pressure_idx+3]

cnt = 0
for idx, event_row in event_360_df.iloc[:10].iterrows():
    event_time = event_row.minute * 60 + event_row.second

    p = Pitch(pitch_type='statsbomb')
    fig, ax = p.draw(figsize=(6, 4))
    start_x, start_y = event_row.location

    if event_row.team_id == home_team_id:
        event_col = color_list[0]
        opponent_col = color_list[1]
    else:
        event_col = color_list[1]
        opponent_col = color_list[0]

    p.scatter(start_x, start_y, ax = ax, color = 'g', marker='*', s=250)

    event_type = event_row.type
    if event_type in  ['Pass', 'Shot', 'Carry']:
        if len(event_row[f'{event_type.lower()}_end_location']) == 2:
            end_x, end_y = event_row[f'{event_type.lower()}_end_location']
        else:
            end_x, end_y, _ = event_row[f'{event_type.lower()}_end_location']

        p.lines(xstart=start_x, ystart=start_y, xend=end_x, yend=end_y, ax=ax, comet=True)

    # Visualize camera visible area
    frames_temp_df = frames_df[frames_df['id']==event_row.id]
    visible_area = frames_temp_df.visible_area.iloc[0]
    visible_area = np.array(visible_area).reshape(-1, 2)
    p.polygon([visible_area], color=(1, 0, 0, 0.3), ax=ax)

    for _, player_row in frames_temp_df.iterrows():
        if player_row.teammate:
            player_col = event_col
        else:
            player_col = opponent_col
        player_x, player_y = player_row.location
        # if event_temp_df['type'].values[0] == 'Pressure':
        #     player_x = 120 - player_x  # Flip x-axis
        #     player_y = 80 - player_y   # Flip y-axis

        p.scatter(player_x, player_y, ax=ax, color=player_col)

    event_period = event_row.period
    event_minute = event_row.minute
    event_second = event_row.second
    ax.set_title(f"{event_period} - {event_minute}:{event_second} : {event_type}")

    # Add legend for teammates and opponents
    home_patch = mpatches.Patch(color=color_list[0], label=home_team_name)
    opponent_patch = mpatches.Patch(color=color_list[1], label=away_team_name)
    ax.legend(handles=[home_patch, opponent_patch], loc='upper right')

Attempted solutions

I have noticed that flipping the x and y coordinates resolves the issue with visualizing Statsbomb 360 data correctly. However, I am unsure which specific datasets are recorded with flipped coordinates.

scotty779 commented 2 weeks ago

I think this is just down to the fact we give the coordinates of events from the perspective of the team who the event belongs to with them always shooting left-to-right. So in those examples the pressure event is flipped because it comes from the other team.

Does that make sense?

leemingo commented 2 weeks ago

Thank you for your response. I initially thought the same, but after visualizing other match data, I noticed that not only Pressure, but also Ball Receipt and Pass were sometimes visualized in the opposite direction, as shown below. In some cases, the Ball Receipt and Pass are displayed in the opposite direction, while in others, they follow the original direction. And as seen in the third image, there were cases where only the event location was correct, while the positions of the other players were in the opposite direction. I’m wondering what rule should be applied to ensure consistent visualization. statsbomb2

MohaemdElhossin commented 1 week ago

@leemingo Do you have the keys or the IDs of these frames?

moshahin8 commented 1 week ago

@leemingo @scotty779, Do you have any event IDs with this issue so I can check them?

leemingo commented 1 week ago

@moshahin8 ID of first image (btw Leverkusen and Bremen) is 3895302 and event ID of Pressure is cfa1f5e1-4e8e-4fc2-bbc8-4df5fdef8283. Second image (btw Berlin and Leverkusen)is 3895292 and event ID of Ball Receipt* (third one) is fa528a3f-7190-4700-9f51-16c6a671a4f3.

moshahin8 commented 1 week ago

we will check it on our end and then get back to you asap @leemingo

scotty779 commented 4 days ago

@leemingo Hey so the issue is down to failed ball receipts which we use to know who the intended pass recipient was for failed passes. In this case you can remove any ball receipts with the outcome as failed and that should tidy up the visualisations for you.

leemingo commented 4 days ago

So that means any kind of action it was failed or events from the other team (like pressure) are provided as flipped?

scotty779 commented 3 days ago

Coordinates are always from the perspective of the team the event belongs to, with the team always shooting left to right. Only failed ball reciepts should be removed as they're not real events, we synthesise them for modelling purposes.

Does that make sense?

leemingo commented 3 days ago

Thanks for your answer! Would the same reasoning apply to actions like "Dispossessed" as well?

image