Clip Segmentation and Trimming

jaredb1011 commented 1 year ago

Description:

Develop code that takes a gameplay video as input, segments it into smaller clips of no longer than 30 seconds, and trims irrelevant sections such as menus, intros, and outros. The code should be designed in a modular fashion to accommodate game-specific features, allowing it to work with various games.

Requirements:

Ability to input a gameplay video file in common formats (e.g., MP4, AVI, MOV, etc.).
Segment input video into smaller clips with a maximum length of 30 seconds each.
Detect and trim irrelevant sections of the video such as menus, intros, and outros.
Design code in a modular fashion to accommodate game-specific features.
Include a configuration file or similar mechanism to easily adapt the code for different games.
The output should be a set of video files, each containing a segmented and trimmed clip from the original video.
Provide clear documentation on how to use and adapt the code for various games.

Acceptance Criteria:

Successfully input a gameplay video file in at least one of the common video formats (e.g., MP4, AVI, MOV, etc.).
Automatically segment input video into smaller clips with a maximum length of 30 seconds.
Detect and trim irrelevant sections of the video such as menus, intros, and outros, with at least 80% accuracy.
Modular code design that allows for the easy addition or modification of game-specific features.
Configuration file or similar mechanism included, allowing users to adapt the code for different games without modifying the core code.
Output a set of video files, each containing a segmented and trimmed clip from the original video.
Clear documentation provided on how to use and adapt the code for various games.

Notes:

Since game-specific features may vary, it is suggested to create a basic solution first and then incrementally add support for different games as needed. Consider using machine learning techniques, such as computer vision or deep learning, to detect and trim irrelevant sections with higher accuracy. For better compatibility, consider using open-source libraries and tools for video processing, such as OpenCV, FFmpeg, or similar.

jaredb1011 commented 1 year ago

Note: we had a previous attempt at clip segmentation for the first prototype: https://github.com/waldo-vision/aimbot-detection-prototype/tree/main/clip_creator

Why-Ay-Es-Haitch commented 1 year ago

I think a good starting point would be to use some kind of text-detection model to detect and read text in the images. Text that we know occurs in the option menus/endscreens/startscreens can be detected and tagged as not part of the clip. This way we can also modify the banned text through a config file and add known menu text for games like overwatch etc. This can be considered kind of hard problem and we want to prevent scope creep so we probably shouldn't over-engineer a solution to this atm. I'd be happy to work on this.

Tailen commented 1 year ago

I think a good starting point would be to use some kind of text-detection model to detect and read text in the images. Text that we know occurs in the option menus/endscreens/startscreens can be detected and tagged as not part of the clip. This way we can also modify the banned text through a config file and add known menu text for games like overwatch etc.

This can be considered kind of hard problem and we want to prevent scope creep so we probably shouldn't over-engineer a solution to this atm.

I'd be happy to work on this.

This library looks suitable for the OCR part https://github.com/PaddlePaddle/PaddleOCR

One thing we have to consider is that the submitted videos won't all be in English, so we might have to get UI texts in all supported languages. Another idea is to use OCR to get a rough sample dataset, then train a image classifier on this dataset and hopefully it generalizes to all languages.

I'm happy to work on this as well.

mattolson93 commented 1 year ago

OCR might be helpful because kills are recorded in the feed, and we could coordinate the username of the player to detect kills. We hand-coded a solution last year, but it was resolution specific.

jason9075 commented 1 year ago

I think OpenCV template matching can achieve this. Use template matching to distinguish loading screen or gameplay screen.

Huskydog9988 commented 1 year ago

I feel like object detection with yolo for example would be a more dynamic option to excluding any menus.

jaredb1011 commented 1 year ago

Yeah, like if there are logos or symbols within menus that are not dependent on a player's region/language, I think that'd be the most reliable way to detect it.

Joe-TheBro commented 1 year ago

Haitch, idk if you have done any work on this, but I would love to coordinate. I'm gonna start trying some stuff.

waldo-vision / models