snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Pandas Panel package deprecated in newer versions #1632

Closed NaRuecker closed 3 years ago

NaRuecker commented 3 years ago

I updated my pandas version to 1.2.2 and that caused an old snorkel script to fail with the following error:

ImportError: cannot import name 'Panel' from 'pandas' (C:\ProgramData\Anaconda3\lib\site-packages\pandas__init__.py)

In pandas version 0.25.1 the deprecation was also announced image

Updating snorkel to 0.9.6, did not help either, so I'm assuming there is no workaround, yet.

Below a code sample:

from snorkel.labeling import PandasLFApplier

ABSTAIN = 0
NonDog = 0
Dog = 1

@labeling_function()
def dog(x):
    return Dog if "dog" in x.text.lower() else ABSTAIN

lfs = [dog]

applier = PandasLFApplier(lfs)

def AddLabels(df, applier):
    df['Text']=df['Description']
    L_CoreData = applier.apply(df=df)
    print(L_CoreData.shape)
    COLUMNS=['dog']

    labels_df= pd.DataFrame(L_CoreData, columns=COLUMNS)
    print(labels_df.shape)
    print(df.shape)
    df=df.reset_index()
    return(df)

labeled_data=AddLabels(data, applier)

This results with this error: image

bhancock8 commented 3 years ago

Hi Nadine, I was able to get your code above to work after (a) changing the field name from "Text" to "text" (to match its usage in the @labeling_function—in the future, a small sample payload would be helpful), and (b) updating my numpy version. So I'm running (on linux) with snorkel==v0.9.6, numpy==1.16.5, pandas==1.2.3. PR #1633 should address this in a permanent way, but in the meantime see if updating your environment as described above unblocks you.