snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Operator names not unique: 2 operators with name check #1548

Closed linonetwo closed 4 years ago

linonetwo commented 4 years ago

Issue description

I'm trying to load rule from config file, and generate labeling function on the fly, but I get: ValueError: Operator names not unique: 2 operators with name check

Code example/repro steps

ruleMatch is my function that parses rule and generates a parser.

from typing import List
from snorkel.labeling import labeling_function, PandasLFApplier, LFAnalysis

def getLabelingFunctions(rules: List[str]):
  labelingFunctions = []
  for rule in rules:
    @labeling_function()
    def check(x):
      return SPAM if ruleMatch(rule, x.text.lower()) else ABSTAIN
    labelingFunctions.append(check)
  return labelingFunctions

applier = PandasLFApplier(lfs=getLabelingFunctions(["$ * check $ *", "$ * check out $ *"]))
Traceback (most recent call last):
  File "/xxx/checkRuleCoverage.py", line 27, in <module>
    applier = PandasLFApplier(lfs=getLabelingFunctions(["$ * check $ *", "$ * check out $ *"]))
  File "/opt/anaconda3/envs/TensorFSARNN/lib/python3.6/site-packages/snorkel/labeling/apply/core.py", line 38, in __init__
    check_unique_names(self._lf_names)
  File "/opt/anaconda3/envs/TensorFSARNN/lib/python3.6/site-packages/snorkel/utils/data_operators.py", line 9, in check_unique_names
    raise ValueError(f"Operator names not unique: {ct} operators with name {k}")
ValueError: Operator names not unique: 2 operators with name check

Expected behavior

We can use dynamic generated labeling functions

System info

linonetwo commented 4 years ago

I know, I can passing name=:

@labeling_function(name=rule)
    def check(x):
      return SPAM if ruleMatch(rule, x.text.lower()) else ABSTAIN
    labelingFunctions.append(check)
linonetwo commented 4 years ago

If I do so, there will not be an error, but all things will be checked by the last labeling function, the previous one will be overwritten by the last function, result in all [True, True] or [False, False], identical result.

linonetwo commented 4 years ago

Oh, it is due to closure, this can be solved by using IIFE:

def getLabelingFunctions(rules: List[str]):
    labelingFunctions = []
    for rule in rules:
        # use iife to prevent classical closure stale value problem
        def IIFE(rule: str):
            @labeling_function(name=rule)
            def checkRuleCoversText(row: Series):
                # make sure pandas row contains value, it sometimes just being None
                if len(row.values) == 0:
                    return False
                text = row.values[0]
                if text == None:
                    return False
                # now use the rule to check text
                return True if dfaFromRule(rule).execute(text) else False

            labelingFunctions.append(checkRuleCoversText)

        IIFE(rule)
    return labelingFunctions