xecarlox94 / Computational-Imaging

Feature extraction/tracking for Football match recordings
GNU Lesser General Public License v3.0
5 stars 1 forks source link

Introduction

summarising objectives

The project's aims are to collect track and event data from football footage. The ultimate goal is to be able to process any kind of footage but, for now, it will only process broadcast football matches.

problems solved to achieve objectives

methods

results

(sshot> image recognition, ball and humans)

full text is here!!!!! description players are detected but ball is not (purple means that object recognition just ran). multiple players are detected in the same bounding box. refs are detected as well. one steward is also detected. streaming is stopped because ball is not found.

full text is here!!!!! description players bounding boxes in green means that it is tracking. manually labelling ball to continue stream.

full text is here!!!!! description ball tracking is lost and tracks the numbers on the players back.

full text is here!!!!! description player tracking continues. new players appear on the screen but they are not detected until 30frame period runs object detection again.

full text is here!!!!! description ball tracking is lost again because of the pitch lines and player boots. some players previously detected are lost because of the backgroup from ads or pitch (not enough constract).

full text is here!!!!! description ball needs to be labelled again to be tracked again.

full text is here!!!!! description the object detection is ran. all the human trackers are removed but the ball tracker. the ball tracker is not reset if the ball is still being tracked.

full text is here!!!!! description the players tracker had to be reset again from the because the detected players moved out of the screen in the meantime. ball tracker continues to run, regardless.

full text is here!!!!! description the ball tracker is wrong again, by tracking the player's back number.

full text is here!!!!! description this is a video segment imposed by the director. data cannot be collected. this is a short moment.

full text is here!!!!! description ball is tracked from the since the label needs to be labelled for the stream to continue (needs to cut the scene). there is a bug, a human is recognised due to noise.

full text is here!!!!! description players are detected again. the bug stil persists. a fan is recognised in the crowd. the ball is again not recognised.

full text is here!!!!! description most players are visible from this new perspective. The ball is not recognised.

full text is here!!!!! description most of the players are recognised from this perspective. The ball is visible and being tracked.

full text is here!!!!! description in this frame few players were recognised because the camera view moved very suddendly (from the previous frame ) in betwee the object detection cycle. The ball tracker was lost and it is tracking the payer's leg instead.

full text is here!!!!! description In this frame most players are detected including all players inside the box which is the region of interest when a team is attacking. The ball tracker has lost the ball because the camera view is being blocked by the crossing player's leg. (sshot> pitch 3d modelling and camera automation)

achievements and limits

achievements

creating video collection algorithm for sports

creating realistic 3d pitch model

creating complete 3d model data generation framework

creating recursively conditional machine learning model

creating geometric framework to map objects from screen on pitch, and from pitch on to file

limits

human detection may contain more than 1 human non-consistent ball detection

dissertation organisation sketch

Background

    SKIP!!!!!!!!!!!!!!!!!!!!!

Work carried out

To add to methods section

fun encode_data(data):

origin, frames_vectors, pitch_vectors := data

encoded_origin = encode(origin)

encoded_frames_vectors = encode(frames_vectors)

encoded_pitch_vectors = encode(pitch_vectors)

return tuple(
    encoded_origin +
    encoded_frames_vectors +
    encoded_pitch_vectors
)

fun get_data(camera):

cam_origin_vector := camera.matrix.translation()

frames_vectors := cam_origin.frames()

pitch_vectors := []
for marker in blender.collection("pitch markers"):
    append(marker, pitch_vectors)

return origin, frames_vectors, pitch_vectors

for camera in cameras: file_name := blender.render_image(camera)

data := get_data(camera)

encoded_data := encode_data(data)

write_to_csv(file_name, encoded_data)

camera.change_angle()

fun train_model(data, params):

output_size := params.output_size

secondary_input_len := params.secondary_input_len

convolution_layers := [
    Input(IMG_WIDTH, IMG_HEIGHT),
    Convolution2D(),
    flatten()
]

output := Output(output_size)

if secondary_input_len > 0:

    secondary_input := [
        Input(secondary_input_len)
    ]

    model := Model(
        input := concatenate(
            secondary_input,
            convolution_layers
        ),
        outputs := output
    )

else:

    model := Model(
        input := convolution_layers,
        outputs := output
    )

compile_model(model, data, params)

train_model( data, [ model := "cam_origin_vec", output_size := 3, secondary_input_len := 0, ...params ] )

train_model( data, [ model := "frame_vectors", output_size := 12, secondary_input_len := 3, ...params ] )

train_model( data, [ model := "pitch_corner_vecs", output_size := 8, secondary_input_len := 15, ...params ] )

train_model( data, [ model := "pitch_vectors", output_size := 70, secondary_input_len := 23, ...params ] )

fun get_frame_prediction(frame):

def get_model_pred(model_names, X):
    if model_names = []:
        return X

    model := model_names[0]

    pred = model.predict([
            frame,
            (
                [] if X == [] else np.array([X])
            )
        ]
    )

    return get_model_pred(
        model_names[1:],
        X + pred
    )

return get_model_pred(
    [
        "cam_origin_vec",
        "frame_vectors",
        "pitch_corner_vecs",
        "pitch_vectors"
    ],
    []
)

video processing

- machine learning humans and ball recognition
- object traking

3d modelling and dataset generation

stext description created 3d reference system that maps the points recognisable by the camera. This will be used for the artificial intelligence model to train the model and to process the video stream to perform the homographic transformation.

text description The 3d model was developed on blender. It is a green 3d texture (to emulate the grass) with a pitch png transparent graphic to produce the white lines.

text description The result is a realistic pitch replica that can be rendered by a blender camera to produce the synthetic dataset

text description this is the rendered image from a blender camera, this image is then the processed to be then used as the input for the artifical intelligence model.

text description The pitch was then improved by adding 3d markers corresponding to the map reference system. These markers can be accessed by the blender cameras to retrieve their position on the camera view and their relative position to the cartesian origin.

text description This is the view of the box which shows the position of the markers from a closer view

text description This is the view from the goal which is an important object that is important for image recognition. it has the only markers with a positive z-index to emulate the top corners of the goal. It also includes the corner flag (on the left) because all pitches have them by regulation.

text description The final step is to create 15 cameras which will rotate within a range and will render images for the dataset from these different position to emulate the real camera which will be put in different positions.

Testing

Testing assessment

Performance assessment

any other experimental work

Conclusions

main achievements

    (relating them to initial objectives)
    (as well as similar worh from others)

the main limitations of work

possible extensions and future work

extra