huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.28k stars 2.7k forks source link

load_dataset #7275

Open santiagobp99 opened 2 weeks ago

santiagobp99 commented 2 weeks ago

Describe the bug

I am performing two operations I see on a hugging face tutorial (Fine-tune a language model), and I am defining every aspect inside the mapped functions, also some imports of the library because it doesnt identify anything not defined outside that function where the dataset elements are being mapped:

https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb#scrollTo=iaAJy5Hu3l_B

`- lm_datasets = tokenized_datasets.map( group_texts, batched=True, batch_size=batch_size, num_proc=4, )

Steps to reproduce the bug

Currently handle all the imports inside the function

Expected behavior

The code must work es expected in the notebook, but currently this is not happening.

https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb#scrollTo=iaAJy5Hu3l_B

Environment info

print(transformers.version)

4.46.1