The project's aims are to collect track and event data from football footage. The ultimate goal is to be able to process any kind of footage but, for now, it will only process broadcast football matches.
data processing
3d modelling
machine learning modeling
detect humans
detect ball
pitch geometry reconstruction
(sshot> image recognition, ball and humans)
players are detected but ball is not (purple means that object recognition just ran). multiple players are detected in the same bounding box. refs are detected as well. one steward is also detected. streaming is stopped because ball is not found.
players bounding boxes in green means that it is tracking. manually labelling ball to continue stream.
ball tracking is lost and tracks the numbers on the players back.
player tracking continues. new players appear on the screen but they are not detected until 30frame period runs object detection again.
ball tracking is lost again because of the pitch lines and player boots. some players previously detected are lost because of the backgroup from ads or pitch (not enough constract).
ball needs to be labelled again to be tracked again.
the object detection is ran. all the human trackers are removed but the ball tracker. the ball tracker is not reset if the ball is still being tracked.
the players tracker had to be reset again from the
the ball tracker is wrong again, by tracking the player's back number.
this is a video segment imposed by the director. data cannot be collected. this is a short moment.
ball is tracked from the
players are detected again. the bug stil persists. a fan is recognised in the crowd. the ball is again not recognised.
most players are visible from this new perspective. The ball is not recognised.
most of the players are recognised from this perspective. The ball is visible and being tracked.
in this frame few players were recognised because the camera view moved very suddendly (from the previous frame
In this frame most players are detected including all players inside the box which is the region of interest when a team is attacking. The ball tracker has lost the ball because the camera view is being blocked by the crossing player's leg. (sshot> pitch 3d modelling and camera automation)
creating video collection algorithm for sports
creating realistic 3d pitch model
creating complete 3d model data generation framework
creating recursively conditional machine learning model
creating geometric framework to map objects from screen on pitch, and from pitch on to file
human detection may contain more than 1 human non-consistent ball detection
SKIP!!!!!!!!!!!!!!!!!!!!!
To add to methods section
pseudocode for main code
fun findObjects(frame):
model := YoloModel()
confidence_threshold := double()
ball_conf_threshold := double()
outputs := model.predict(frame)
objs := []
for output in outputs:
conf := output.conf
name := output.name
if name = "person" and confidence_threshold < conf:
objs.append(output)
if name = "sports ball" and ball_conf_threshold < conf:
ball := output
return objs, ball
fun label_ball(frame):
ball_tracker := BallTracker()
box := OpenCV.selectROI(frame)
ball_tracker.init(frame, box)
return ball_tracker
def main():
frame_count := 0
for frame in video:
if key = "q":
break
if key = "f":
label_ball(frame)
if frame_count % 30 = 0:
perform_detection := true
else:
perform_detection := false
obj_tracker := ObjTracker()
ball_tracker := ObjTracker()
if perform_detection:
objs, ball := findObjects(frame)
if length(objs) > 0:
for o in objs:
obj_tracker.add(o)
frame_count += 1
if ball:
ball_tracker.init(ball)
else:
label_ball(frame)
else:
is_tracking, boxes := obj_tracker.update(frame)
if is_tracking:
draw(boxes)
frame_count += 1
else:
frame_count := 0
is_tracking, box := ball_tracker.update(frame)
if is_tracking:
draw(ball)
else:
label_ball(frame)
pseudocode for 3-d modelling data set generation
fun encode_data(data):
origin, frames_vectors, pitch_vectors := data
encoded_origin = encode(origin)
encoded_frames_vectors = encode(frames_vectors)
encoded_pitch_vectors = encode(pitch_vectors)
return tuple(
encoded_origin +
encoded_frames_vectors +
encoded_pitch_vectors
)
fun get_data(camera):
cam_origin_vector := camera.matrix.translation()
frames_vectors := cam_origin.frames()
pitch_vectors := []
for marker in blender.collection("pitch markers"):
append(marker, pitch_vectors)
return origin, frames_vectors, pitch_vectors
for camera in cameras: file_name := blender.render_image(camera)
data := get_data(camera)
encoded_data := encode_data(data)
write_to_csv(file_name, encoded_data)
camera.change_angle()
fun train_model(data, params):
output_size := params.output_size
secondary_input_len := params.secondary_input_len
convolution_layers := [
Input(IMG_WIDTH, IMG_HEIGHT),
Convolution2D(),
flatten()
]
output := Output(output_size)
if secondary_input_len > 0:
secondary_input := [
Input(secondary_input_len)
]
model := Model(
input := concatenate(
secondary_input,
convolution_layers
),
outputs := output
)
else:
model := Model(
input := convolution_layers,
outputs := output
)
compile_model(model, data, params)
train_model( data, [ model := "cam_origin_vec", output_size := 3, secondary_input_len := 0, ...params ] )
train_model( data, [ model := "frame_vectors", output_size := 12, secondary_input_len := 3, ...params ] )
train_model( data, [ model := "pitch_corner_vecs", output_size := 8, secondary_input_len := 15, ...params ] )
train_model( data, [ model := "pitch_vectors", output_size := 70, secondary_input_len := 23, ...params ] )
fun get_frame_prediction(frame):
def get_model_pred(model_names, X):
if model_names = []:
return X
model := model_names[0]
pred = model.predict([
frame,
(
[] if X == [] else np.array([X])
)
]
)
return get_model_pred(
model_names[1:],
X + pred
)
return get_model_pred(
[
"cam_origin_vec",
"frame_vectors",
"pitch_corner_vecs",
"pitch_vectors"
],
[]
)
- machine learning humans and ball recognition
- object traking
stext description created 3d reference system that maps the points recognisable by the camera. This will be used for the artificial intelligence model to train the model and to process the video stream to perform the homographic transformation.
The 3d model was developed on blender. It is a green 3d texture (to emulate the grass) with a pitch png transparent graphic to produce the white lines.
The result is a realistic pitch replica that can be rendered by a blender camera to produce the synthetic dataset
this is the rendered image from a blender camera, this image is then the processed to be then used as the input for the artifical intelligence model.
The pitch was then improved by adding 3d markers corresponding to the map reference system. These markers can be accessed by the blender cameras to retrieve their position on the camera view and their relative position to the cartesian origin.
This is the view of the box which shows the position of the markers from a closer view
This is the view from the goal which is an important object that is important for image recognition. it has the only markers with a positive z-index to emulate the top corners of the goal. It also includes the corner flag (on the left) because all pitches have them by regulation.
The final step is to create 15 cameras which will rotate within a range and will render images for the dataset from these different position to emulate the real camera which will be put in different positions.
testing with random camera, get accuracy
testing machine learning model accuracity over many layers
(relating them to initial objectives)
(as well as similar worh from others)
the output will always be an approximation (real world)
detect ball consistently
calculate ball trajectory
ball tracking is suspended whenever an object obstructs the camera view
players crossing eachother
cannot detect players outside camera frame. could create AI model for calculating probable player in position
cannot recognise players on camera frame. could create AI model for calculating their identity based on position/appearence
is not real-time, at this moment
Currently not able to track 3d trajectory of objects
green masking may not work for non-green pitches and green kits
use Google Research Football Environment
modularise all modules and algorithms to allow other sports
dataset generation
video segment detection (also replays)
methods will not apply
data collection
synchronise footage and data timestamps
-----> space and spatial multiple image semantic matching
image/video processing operations (openCV)
sythetic image dataset (Blender)
general
paralelise and multithread program
move data generation and training to cloud