Closed EngineerKhan closed 9 months ago
The map() documentation reads: ds = ds.map(lambda x: tokenizer(x['text'], truncation=True, padding=True), batched=True)
ds = ds.map(lambda x: tokenizer(x['text'], truncation=True, padding=True), batched=True)
I have been trying to reproduce it in my code as:
tokenizedDataset = dataset.map(lambda x: tokenizer(x['text']), batched=True)
But it doesn't work as it throws the error:
KeyError: 'text'
Can you please guide me on how to fix it?
dataset = load_dataset("amazon_reviews_multi")`
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")`
As mentioned in the documentation, it should run without any error and map the tokenization on the whole dataset.
Python 3.10.2
There is no "text" column in the amazon_reviews_multi, hence the KeyError. You can get the column names by running dataset.column_names.
amazon_reviews_multi
KeyError
dataset.column_names
Describe the bug
The map() documentation reads:
ds = ds.map(lambda x: tokenizer(x['text'], truncation=True, padding=True), batched=True)
I have been trying to reproduce it in my code as:
tokenizedDataset = dataset.map(lambda x: tokenizer(x['text']), batched=True)
But it doesn't work as it throws the error:
Can you please guide me on how to fix it?
Steps to reproduce the bug
dataset = load_dataset("amazon_reviews_multi")`
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")`
Expected behavior
As mentioned in the documentation, it should run without any error and map the tokenization on the whole dataset.
Environment info
Python 3.10.2