Question in 02_classification.ipynb

nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book

https://transformersbook.com/

Apache License 2.0

3.91k stars 1.22k forks source link

Question in 02_classification.ipynb #131

Open pangniuniu95 opened 10 months ago

pangniuniu95 commented 10 months ago

Information

The question or comment is about chapter:

[ ] Text Classification

Question or comment

_def labelint2str(row): return emotions["train"].features["label"].int2str(row)

but I received error report as: 'Value' object has no attribute 'int2str'.

I printed the types of emotions[label] with emotions["train"].features["label"], I received Value(dtype='int64', id=None)

Krishna2709 commented 9 months ago

Hey @pangniuniu95,

You might have done this, but can you cross-check again on emotions.set_format(type='pandas')? I think your emotions dataset is not set to pandas format, leading to the above error.

If you're still getting the error, can you provide the part of the script - function calling and other relevant code?

jespernwulff commented 6 months ago

I'm getting the same error when I run 01_introduction.ipynb on Colab. I've double checked that I have run emotions.set_format(type='pandas').

AttributeError: 'Value' object has no attribute 'int2str'

jespernwulff commented 6 months ago

It seems that I can get around this by manually defining the labels


# Manually define the mapping from int to str`
label_mapping = {
    0: "sadness",
    1: "joy",
    2: "love",
    3: "anger",
    4: "fear",
    5: "surprise"
}

# Function to convert label integer to string
def label_int2str(label):
    return label_mapping[label]

# Apply the function to the label column
df["label_name"] = df["label"].apply(label_int2str)

print(df.head())

jespernwulff commented 6 months ago

There is no issue if I run a seperate notebook with

!pip install datasets==2.8.0
import pandas as pd
from datasets import load_dataset

# Load the emotions dataset
emotions = load_dataset("emotion")

emotions.set_format(type="pandas")
df = emotions["train"][:]

def label_int2str(row):
    return emotions["train"].features["label"].int2str(row)

df["label_name"] = df["label"].apply(label_int2str)
df.head()

So it could be that it's the version of the datasets library that is used in the notebook? When running setup_chapter() it reports Using datasets v1.16.1.