snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Q&A - Error in applier.apply(df=df_train) method #1590

Closed dmorgan07 closed 4 years ago

dmorgan07 commented 4 years ago

Hello everyone, I'm quite new to python and snorkel so it would be helpful if someone can tell me how to solve this issue. I'm working on a problem similar to the spam detection problem provided in the snorkel documentation and I'm following the same steps. I have a training set of about 1000 samples (A pandas df with a single column 'data' containing the text data to be classified) and I've created a set of regex based LF's which are working fine. An example of the LF is used was:

@labeling_function() def PIIemailPass(data): return LEAK if re.search(r'[a-zA-Z0-9.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+[*:]+[\w+]', data) else ABSTAIN

But when I try using the following lines of code after I create my LF's applier = PandasLFApplier(lfs) L_train = applier.apply(df=df_train) I get an error ('expected string or bytes-like object', 'occurred at index 0'). Can someone help me with this problem.