Introduction

summarising objectives

very ambitious
create solid foundation
long future project
long goal of modularising this football framework into a collective sports library composed of smaller reusable/generic units because collective sports have different dynamics
create semi-automated video processing analysis (reducing number of operators)
processes normal match streams
create framework for football data science research (academic or business)
microsoft computer vision research, https://microsoft.github.io/FaceSynthetics/
free and open source (will take additional work to avoid non compatible libraries with GPL3, it iwll have loads of moving parts) (took a lot of time but had to use AI libraries that force me to use Lesser GPL3 license instead)

The project's aims are to collect track and event data from football footage. The ultimate goal is to be able to process any kind of footage but, for now, it will only process broadcast football matches.

problems solved to achieve objectives

data processing
- encode and decode positional data
- encode and decode image data
- various changes
3d modelling
- creating realistic 3d environments with main features
- create reference map
- automate cameras
- geometric calculations to get outputs
machine learning modeling
- test and measure different architectures
- preprocess images
detect humans
- detect players
- track players
- recognise players
  - identify players numbers
  - identify players visual characteristics
- filter out referee(s)
- detects large human (noise, needs to be removed)
- determine players position
- if camera moves too suddendly, few players will be tracked until next object detection
detect ball
- detect ball
- track ball
pitch geometry reconstruction
- optical distortion
- algorithm to determine inner section
- image homographical transformation
- grid positioning
- map players and ball to pitch

methods

image/video processing operations (openCV)
- video stream processing
- manual labelling of image
- image processing (colour conversion)
- image masking to what is green
sythetic image dataset (Blender)
- create 2d pitch png image
- create pitch texture
- create goals objects
- create cameras and automate them
- create script to render images and automate camera movement
- pitch construction in blender
- encode data onto file
- create decoding data
- data generation scripting
- create script to extract data from 3d world
- scene's camera positioning, direction and rotation (to avoid pointing to horizon)
deep/machine learning (tensorflow)
- create convolution network
- create multiple machine learning pipelines
- creating recursively conditional machine learning model (first detect camera location, then frame positions, then corner positions, and finally rest of point on the screen)
- homographical transformation ???
Object detection (YOLO v4)
- uses a preconfigured convolution network, already tested (version 4)
- uses a pretrained model
- detects many types of objects (in coco.names)
- algorithm filters all objects detected but human and balls
- humans are detected above a certain threshold
- if successful, the ball is returned by taking the detection with the highest confidence
- Non-maximum suppression is used, based on the human detection threshold and the Non-maximum suppression threshold, to remove the overly redundant overlapping detections
Object tracking (OpenCV tracking)
- takes detection bounding boxes and creates trackers
- using CSRT tracker to follow humans and ball
- tracker updates every frame
- restarts tracking when yolo detection runs again

results

(sshot> image recognition, ball and humans)

full text is here!!!!! description players are detected but ball is not (purple means that object recognition just ran). multiple players are detected in the same bounding box. refs are detected as well. one steward is also detected. streaming is stopped because ball is not found.

full text is here!!!!! description players bounding boxes in green means that it is tracking. manually labelling ball to continue stream.

full text is here!!!!! description ball tracking is lost and tracks the numbers on the players back.

full text is here!!!!! description player tracking continues. new players appear on the screen but they are not detected until 30frame period runs object detection again.

full text is here!!!!! description ball tracking is lost again because of the pitch lines and player boots. some players previously detected are lost because of the backgroup from ads or pitch (not enough constract).

full text is here!!!!! description ball needs to be labelled again to be tracked again.

full text is here!!!!! description the object detection is ran. all the human trackers are removed but the ball tracker. the ball tracker is not reset if the ball is still being tracked.

full text is here!!!!! description the players tracker had to be reset again from the because the detected players moved out of the screen in the meantime. ball tracker continues to run, regardless.

full text is here!!!!! description the ball tracker is wrong again, by tracking the player's back number.

full text is here!!!!! description this is a video segment imposed by the director. data cannot be collected. this is a short moment.

full text is here!!!!! description ball is tracked from the since the label needs to be labelled for the stream to continue (needs to cut the scene). there is a bug, a human is recognised due to noise.

full text is here!!!!! description players are detected again. the bug stil persists. a fan is recognised in the crowd. the ball is again not recognised.

full text is here!!!!! description most players are visible from this new perspective. The ball is not recognised.

full text is here!!!!! description most of the players are recognised from this perspective. The ball is visible and being tracked.

full text is here!!!!! description in this frame few players were recognised because the camera view moved very suddendly (from the previous frame ) in betwee the object detection cycle. The ball tracker was lost and it is tracking the payer's leg instead.

full text is here!!!!! description In this frame most players are detected including all players inside the box which is the region of interest when a team is attacking. The ball tracker has lost the ball because the camera view is being blocked by the crossing player's leg. (sshot> pitch 3d modelling and camera automation)

achievements and limits

achievements

creating video collection algorithm for sports

creating realistic 3d pitch model

creating complete 3d model data generation framework

creating recursively conditional machine learning model

creating geometric framework to map objects from screen on pitch, and from pitch on to file

limits

human detection may contain more than 1 human non-consistent ball detection

Human agent must verify and validate data collection
Human agent must manully segment video streams
Human agent must input match meta data
Human agent must supervise/calibrate video processing
Tracking broadcast is affected by zoom/replays and camera changes

dissertation organisation sketch

Background

    SKIP!!!!!!!!!!!!!!!!!!!!!

Work carried out

To add to methods section

pseudocode for main code

fun findObjects(frame):

model := YoloModel()
confidence_threshold := double()
ball_conf_threshold := double()

outputs := model.predict(frame)

objs := []

for output in outputs:
    conf := output.conf
    name := output.name

    if name = "person" and confidence_threshold < conf:
        objs.append(output)

    if name = "sports ball" and ball_conf_threshold < conf:
        ball := output

return objs, ball

fun label_ball(frame):

ball_tracker := BallTracker()

box := OpenCV.selectROI(frame)

ball_tracker.init(frame, box)

return ball_tracker

def main():

frame_count := 0

for frame in video:

    if key = "q":
        break

    if key = "f":
        label_ball(frame)

    if frame_count % 30 = 0:
        perform_detection := true
    else:
        perform_detection := false

    obj_tracker := ObjTracker()
    ball_tracker := ObjTracker()

    if perform_detection:
        objs, ball := findObjects(frame)

        if length(objs) > 0:
            for o in objs:
                obj_tracker.add(o)

            frame_count += 1

        if ball:
            ball_tracker.init(ball)
        else:
            label_ball(frame)

    else:

        is_tracking, boxes := obj_tracker.update(frame)

        if is_tracking:
            draw(boxes)

            frame_count += 1

        else:
            frame_count := 0

        is_tracking, box := ball_tracker.update(frame)

        if is_tracking:
            draw(ball)

        else:
            label_ball(frame)

pseudocode for 3-d modelling data set generation

fun encode_data(data):

origin, frames_vectors, pitch_vectors := data

encoded_origin = encode(origin)

encoded_frames_vectors = encode(frames_vectors)

encoded_pitch_vectors = encode(pitch_vectors)

return tuple(
    encoded_origin +
    encoded_frames_vectors +
    encoded_pitch_vectors
)

fun get_data(camera):

cam_origin_vector := camera.matrix.translation()

frames_vectors := cam_origin.frames()

pitch_vectors := []
for marker in blender.collection("pitch markers"):
    append(marker, pitch_vectors)

return origin, frames_vectors, pitch_vectors

for camera in cameras: file_name := blender.render_image(camera)

data := get_data(camera)

encoded_data := encode_data(data)

write_to_csv(file_name, encoded_data)

camera.change_angle()

machine learning model and algorithm

fun train_model(data, params):

output_size := params.output_size

secondary_input_len := params.secondary_input_len

convolution_layers := [
    Input(IMG_WIDTH, IMG_HEIGHT),
    Convolution2D(),
    flatten()
]

output := Output(output_size)

if secondary_input_len > 0:

    secondary_input := [
        Input(secondary_input_len)
    ]

    model := Model(
        input := concatenate(
            secondary_input,
            convolution_layers
        ),
        outputs := output
    )

else:

    model := Model(
        input := convolution_layers,
        outputs := output
    )

compile_model(model, data, params)

train_model( data, [ model := "cam_origin_vec", output_size := 3, secondary_input_len := 0, ...params ] )

train_model( data, [ model := "frame_vectors", output_size := 12, secondary_input_len := 3, ...params ] )

train_model( data, [ model := "pitch_corner_vecs", output_size := 8, secondary_input_len := 15, ...params ] )

train_model( data, [ model := "pitch_vectors", output_size := 70, secondary_input_len := 23, ...params ] )

fun get_frame_prediction(frame):

def get_model_pred(model_names, X):
    if model_names = []:
        return X

    model := model_names[0]

    pred = model.predict([
            frame,
            (
                [] if X == [] else np.array([X])
            )
        ]
    )

    return get_model_pred(
        model_names[1:],
        X + pred
    )

return get_model_pred(
    [
        "cam_origin_vec",
        "frame_vectors",
        "pitch_corner_vecs",
        "pitch_vectors"
    ],
    []
)

geometry reconstruction algorithm

video processing

- machine learning humans and ball recognition
- object traking

3d modelling and dataset generation

stext description created 3d reference system that maps the points recognisable by the camera. This will be used for the artificial intelligence model to train the model and to process the video stream to perform the homographic transformation.

text description The 3d model was developed on blender. It is a green 3d texture (to emulate the grass) with a pitch png transparent graphic to produce the white lines.

text description The result is a realistic pitch replica that can be rendered by a blender camera to produce the synthetic dataset

text description this is the rendered image from a blender camera, this image is then the processed to be then used as the input for the artifical intelligence model.

text description The pitch was then improved by adding 3d markers corresponding to the map reference system. These markers can be accessed by the blender cameras to retrieve their position on the camera view and their relative position to the cartesian origin.

text description This is the view of the box which shows the position of the markers from a closer view

text description This is the view from the goal which is an important object that is important for image recognition. it has the only markers with a positive z-index to emulate the top corners of the goal. It also includes the corner flag (on the left) because all pitches have them by regulation.

text description The final step is to create 15 cameras which will rotate within a range and will render images for the dataset from these different position to emulate the real camera which will be put in different positions.

Testing

Testing assessment

testing with random camera, get accuracy
testing machine learning model accuracity over many layers
- configuration of convolution
  - their kernels sizes
  - number of filters
- size of pool layers
- dropout rate
- number of flat dense layer when it has 2 inputs

Performance assessment

any other experimental work

Conclusions

main achievements

    (relating them to initial objectives)
    (as well as similar worh from others)

the main limitations of work

the output will always be an approximation (real world)
detect ball consistently
calculate ball trajectory
ball tracking is suspended whenever an object obstructs the camera view
players crossing eachother
cannot detect players outside camera frame. could create AI model for calculating probable player in position
cannot recognise players on camera frame. could create AI model for calculating their identity based on position/appearence
is not real-time, at this moment
Currently not able to track 3d trajectory of objects
green masking may not work for non-green pitches and green kits

possible extensions and future work

use Google Research Football Environment
- train models and track their movements and actions
- predict off-screen player positioning
- actions recognition
modularise all modules and algorithms to allow other sports
dataset generation
- add random noise to improve dataset and model
video segment detection (also replays)
- recognise video segments
- set-pieces recognition
- clustering/unsopervised problem!!!!!
methods will not apply
- spatio temporal data stream correction
- Human pose estimation
- 3D human interaction
- Machine learning (Extrapolation; training against current data)
- spatio-temporal training (pitch detection, player position detection, actions ....)
- Object interaction tracking
- Multi-algorithm implementation (Detection -> tracking -> identification)
data collection
- create data format (possibly logical ontology to leverage a logical reasoner) to create a richer dataset
- collect event data
- collect tracking data
- synchronise event and tracking data
- synchronise footage and data timestamps
  
  -----> space and spatial multiple image semantic matching
image/video processing operations (openCV)
- image cannying
- manual segmentation of video
- correction of ball labelling
- video/play segmentation
sythetic image dataset (Blender)
- create 2d pitch png image
- create pitch texture
- create goals objects
- create cameras and automate them
- create script to render images and automate camera movement
- pitch construction in blender
- camera positioning
- data generation scripting
- create script to extract data from 3d world
- encode data onto file
- create decoding data
general
- low quality footage
- weather and light visual conditions
- detect refs by colour
- ignoring people outside of the pitch
- short video (replay/sudden angles) segments interrupt data collection
paralelise and multithread program
move data generation and training to cloud

extra

when downloading copyrighted recordings or process data, reference source and its copyright
set up public dataset for public use by academics

xecarlox94 / Computational-Imaging

readme