waldo-vision / models

Repository for model development and training
https://waldo.vision
Mozilla Public License 2.0
12 stars 4 forks source link

Clip Segmentation and Trimming #3

Open jaredb1011 opened 1 year ago

jaredb1011 commented 1 year ago

Description:

Develop code that takes a gameplay video as input, segments it into smaller clips of no longer than 30 seconds, and trims irrelevant sections such as menus, intros, and outros. The code should be designed in a modular fashion to accommodate game-specific features, allowing it to work with various games.

Requirements:

Acceptance Criteria:

Notes:

Since game-specific features may vary, it is suggested to create a basic solution first and then incrementally add support for different games as needed. Consider using machine learning techniques, such as computer vision or deep learning, to detect and trim irrelevant sections with higher accuracy. For better compatibility, consider using open-source libraries and tools for video processing, such as OpenCV, FFmpeg, or similar.

jaredb1011 commented 1 year ago

Note: we had a previous attempt at clip segmentation for the first prototype: https://github.com/waldo-vision/aimbot-detection-prototype/tree/main/clip_creator

Why-Ay-Es-Haitch commented 1 year ago

I think a good starting point would be to use some kind of text-detection model to detect and read text in the images. Text that we know occurs in the option menus/endscreens/startscreens can be detected and tagged as not part of the clip. This way we can also modify the banned text through a config file and add known menu text for games like overwatch etc. This can be considered kind of hard problem and we want to prevent scope creep so we probably shouldn't over-engineer a solution to this atm. I'd be happy to work on this.

Tailen commented 1 year ago

I think a good starting point would be to use some kind of text-detection model to detect and read text in the images. Text that we know occurs in the option menus/endscreens/startscreens can be detected and tagged as not part of the clip. This way we can also modify the banned text through a config file and add known menu text for games like overwatch etc.

This can be considered kind of hard problem and we want to prevent scope creep so we probably shouldn't over-engineer a solution to this atm.

I'd be happy to work on this.

This library looks suitable for the OCR part https://github.com/PaddlePaddle/PaddleOCR

One thing we have to consider is that the submitted videos won't all be in English, so we might have to get UI texts in all supported languages. Another idea is to use OCR to get a rough sample dataset, then train a image classifier on this dataset and hopefully it generalizes to all languages.

I'm happy to work on this as well.

mattolson93 commented 1 year ago

OCR might be helpful because kills are recorded in the feed, and we could coordinate the username of the player to detect kills. We hand-coded a solution last year, but it was resolution specific.

jason9075 commented 1 year ago

I think OpenCV template matching can achieve this. Use template matching to distinguish loading screen or gameplay screen.

Huskydog9988 commented 1 year ago

I feel like object detection with yolo for example would be a more dynamic option to excluding any menus.

jaredb1011 commented 1 year ago

Yeah, like if there are logos or symbols within menus that are not dependent on a player's region/language, I think that'd be the most reliable way to detect it.

Joe-TheBro commented 1 year ago

Haitch, idk if you have done any work on this, but I would love to coordinate. I'm gonna start trying some stuff.