pillargg / pillar_algos

Finds best timestamps to cut at
https://docs.pillar.gg/pillar_algos/
GNU General Public License v3.0
1 stars 0 forks source link

ML Model Dev Roadmap #50

Open pomkos opened 2 years ago

pomkos commented 2 years ago

Goal

We want to predict which time segments in a stream has the potential to go viral from the chat.

Training Flow

Output: we now have a validated and trained model! Yay!

Post-Training TODO:

Production Flow:

Model (Pre-Production) Deployment

Maintenance + Monitoring

pomkos commented 2 years ago

@RusseII @chand1012 let us know what you think!

Steve and I came up with this plan today.

pomkos commented 2 years ago

OHIO GOZAIMUSU

Status

Currently we have all features/algos completed. What this means is that:

New Brain

Brain is being rewritten from the ground up in the new_brain branch. What this means is that:

Sample new dataset created with brain:

vid_id Start End algo1 output algo2 output algo3_0 output algo3_5 output algo3_6 output algo4 output CCC Genre View Count
920260466 2021-02-19 19:54:30.649 2021-02-19 19:54:40.157 0.213 12.620951 120 210 0.885 0.123 False MMO 69

New Feature Ideas

Our first attempt is not likely to be too accurate. This can be because of inaccurate features, so here are some potentially useful new ones.

  1. Automated genre categories, probably from steam. Start with general like "strategy", but may have to be more specific like "Grand Strategy" or "Historical"
  2. Labeling of each timestamp with a topic, perhaps using more advanced NLP, Brown Clustering, Butter
  3. Interpretability, speed, F1 score as per this repo

To Do

Brain

The current to do list on brain:

Create Brain which will run all algorithms in order to give features to each desired timestamp

Topic Selection

@gatesyp has too many ideas!!! These need to be investigated:

ML Options

I know we said supervised categorization ML, but within that category the following might make sense:

  1. SGDClassifier. Link <- our first attempt, super simple
  2. DecisionTreeClassifier. Link <- second attempt maybe
  3. Use Metaflow to select models
pomkos commented 2 years ago

Not sure how to receive data (line 102 of new_brain) @gatesyp

pomkos commented 2 years ago

Updated todo. We now can start coding brain to label each timestamp as overlapping CCC or not.

pomkos commented 2 years ago

Updated readme to describe how the flow works, how to call it, and the required/optional parameters for brain.

Current Status

To Do