snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Dynamically generate labeling functions #1562

Closed uyaseen closed 4 years ago

uyaseen commented 4 years ago

I have a situation where an algorithm generates potential heuristics (labeling functions) which change over time (basically at every iteration) and I want to feed these intermediate heuristics as labeling functions to snorkel. Since the heuristics are generated dynamically I cannot code them as labeling functions, is there a way I can achieve this? All of these generated heuristics have the same logic to propose a match with a data sample.

Wirg commented 4 years ago

Hi uyaseen,

I am not a member of the snorkel-team, but I might be able to help ?

What do you mean by "Since the heuristics are generated dynamically I cannot code them as labeling function" ? How do you generate your heuristics (without the labelling function part) ? Could you provide a minimal example ?

uyaseen commented 4 years ago

Hi wirg,

Thanks for offering the help.

What do you mean by "Since the heuristics are generated dynamically I cannot code them as labeling function" ? How do you generate your heuristics (without the labelling function part) ? Could you provide a minimal example ?

Unfortunately, I cannot share the exact details as this is something I might be working on as a potential research task (I can only share the details once it is published).

For the sake of this issue, concretely I mean that my algorithm iteratively generates the noisy + non-noisy heuristics at every iteration, and these heuristics are expanded over time, what I want is to plug them into snorkel as a labeling function. e.g. in the context of spam-tutorial I get the keywords which could be potential signals for spam/non-spam (and I don't know beforehand how many of these signals will I get and therefore I want to add them programmatically as a labeling function, instead of writing them as code):

`from snorkel.labeling import labeling_function

@labeling_function() def check(x): return SPAM if "check" in x.text.lower() else ABSTAIN

@labeling_function() def check_out(x): return SPAM if "check out" in x.text.lower() else ABSTAIN`

I hope it is clear now.

Wirg commented 4 years ago

@uyaseen

From what I understand, you could dynamically create labelling function like this :

from snorkel.labeling import labeling_function

def check_if_match_spam_string_LF_factory(spam_string):
    @labeling_function(name=f"check_if_match_{spam_string}")
    def _check(x):
        return SPAM if spam_string in x.text.lower() else ABSTAIN
    return _check

lfs = [check_if_match_spam_string_LF_factory("check"), check_if_match_spam_string_LF_factory("check out")]

This way, you can dynamically create LF from parameters.

By the way, the decorator pattern (@do_something) is syntaxic sugar for wrapping a function and reassigning it, so you can wrap the function at any time.

@labeling_function()
def check(x):
    return SPAM if "check" in x.text.lower() else ABSTAIN

# is equivalent to

def check(x):
    return SPAM if "check" in x.text.lower() else ABSTAIN

check = labeling_function()(check)

Then you could rebuild the applier with the new functions at each iteration.

uyaseen commented 4 years ago

@Wirg I will give it a try, Many Thanks!

henryre commented 4 years ago

Thanks for the question @uyaseen and the ideas here @Wirg. Just a quick reminder that you can also use the object initialization syntax rather than the decorator syntax for this type of setting. Here's an example in our tutorials:

https://github.com/snorkel-team/snorkel-tutorials/blob/1079507fdf03921132f2dcc682030fcceb7f898b/crowdsourcing/crowdsourcing_tutorial.py#L109