snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Slice based Learning #1650

Closed nsankar closed 2 years ago

nsankar commented 3 years ago

Hi,

Thanks for developing this great library. We have a situation to label anomalies for a system which has multivariate data with normal and abnormal conditions. When we tried Snorkel, as the distribution of the data changes, the conditions for the "labeling functions" as rules applied on our data attributes for flagging anomaly also changes. This means we can't use "labeling functions" which become stale as the data distribution changes ...

Can we use Slice based learning (slicing functions) in this scenario where we have data attributes as continuous numeric values from sensors ? The key question if this is applicable is

let's say when we have a base training data named as train_df_1 for which say we have few slicing functions that we call as sf1 and we built a model . The next day we have a new set of training data train_df_2 which is different from train_df_1 and so we have to define new slicing functions , sf2 for this incremental data.

After this, If we combine train_df_1 and train_df_2 along with its distinct slicing functions sf1 and sf2 to build an updated model , would it be correct ? and would the model conditions for the base data and incremental data using sf1 and sf2 hold good?

Would greatly appreciate your inputs. Thank you.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

vincentschen commented 3 years ago

Hi @nsankar - I would need a little more information to provide specific guidance, but the approach you outlined generally makes sense! Applying slicing functions will tell the model to give "more attention" to these subsets of data. You'll need to retrain your model after your training dataset updates, butt he model should re-learn a representation based on the specific slices of interest.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.