Open pomkos opened 2 years ago
@RusseII @chand1012 let us know what you think!
Steve and I came up with this plan today.
OHIO GOZAIMUSU
Currently we have all features/algos completed. What this means is that:
featureFinder
. To run the algorithms you will have to run it as a class ff = algo1.featureFinder()
and then run the run
function new_feature = ff.run()
Brain
is being rewritten from the ground up in the new_brain branch. What this means is that:
Sample new dataset created with brain
:
vid_id | Start | End | algo1 output | algo2 output | algo3_0 output | algo3_5 output | algo3_6 output | algo4 output | CCC | Genre | View Count |
---|---|---|---|---|---|---|---|---|---|---|---|
920260466 | 2021-02-19 19:54:30.649 | 2021-02-19 19:54:40.157 | 0.213 | 12.620951 | 120 | 210 | 0.885 | 0.123 | False | MMO | 69 |
Our first attempt is not likely to be too accurate. This can be because of inaccurate features, so here are some potentially useful new ones.
The current to do list on brain:
Create Brain
which will run all algorithms in order to give features to each desired timestamp
3_6
sort_by
with select
. We no longer need sort_by
parameter in any algo since the goal is to label each chunk, not to return top clips. We do need to select features of interestfinalizer
from each algo, it is now part of data_handler
and used at brain
level
NoneType
@gatesyp has too many ideas!!! These need to be investigated:
I know we said supervised categorization ML, but within that category the following might make sense:
SGDClassifier
. Link <- our first attempt, super simpleDecisionTreeClassifier
. Link <- second attempt maybeUpdated todo. We now can start coding brain to label each timestamp as overlapping CCC or not.
Updated readme to describe how the flow works, how to call it, and the required/optional parameters for brain.
cccLabeler()
class accept dataframe instead of string
Goal
We want to predict which time segments in a stream has the potential to go viral from the chat.
Training Flow
Output: we now have a validated and trained model! Yay!
Post-Training TODO:
Production Flow:
Model (Pre-Production) Deployment
Maintenance + Monitoring