nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book
https://transformersbook.com/
Apache License 2.0
3.91k stars 1.22k forks source link

Chapter 6 notebook cannot be run even with datasets 2.0.0 due to datasets load_dataset error may be related to Google Virus scan #62

Closed jeromemassot closed 2 years ago

jeromemassot commented 2 years ago

Information

The problem arises in chapter:

Describe the bug

datasets module cannot load the cnn_dailymail dataset correctly due to bug datasets bug #3787. The bug should have been fixed in datasets v 2.0.0 release but the notebook env is using v 1.16.1. After upgrading the datasets lib to v2.0.0. the bug persists.

To Reproduce

Steps to reproduce the behavior:

  1. setup_chapter() to check that the datasets version is 1.16.1
  2. upgrade the datasets version to 2.0.0
  3. dataset = load_dataset("cnn_dailymail", version="3.0.0"°
  4. FileNotFoundError: [WinError 3]

Expected behavior

The dataset should be loaded flawlessly as the bug #3787 has been said closed and the problem solved.

gcmsrc commented 2 years ago

Please use this dataset >>> ccdv/cnn_dailymail unless you are running the notebook in Colab. Please also see this thread

jeromemassot commented 2 years ago

Hi @gcmsrc I confirm that your fix works. I am closing the issue. Thanks.