pytorch / tutorials

PyTorch tutorials.
https://pytorch.org/tutorials/
BSD 3-Clause "New" or "Revised" License
8.23k stars 4.07k forks source link

[BUG] - Training Transformer models using Distributed Data Parallel and Pipeline Parallelism Tutorial broken #2916

Open loganthomas opened 5 months ago

loganthomas commented 5 months ago

Add Link

@pritamdamania87 for awareness

Describe the bug

Related to #2895 and #2910

The training process uses Wikitext-2 dataset from torchtext which is no longer supported. In #2895, the proposed solution was to use hugging face's version of wikitext2.

/usr/local/lib/python3.10/dist-packages/torchtext/datasets/__init__.py:4: UserWarning: 
/!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\ 
Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: `import torchtext; torchtext.disable_torchtext_deprecation_warning()`
  warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG)
/usr/local/lib/python3.10/dist-packages/torchtext/data/__init__.py:4: UserWarning: 
/!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\ 
Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: `import torchtext; torchtext.disable_torchtext_deprecation_warning()`
  warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG)
/usr/local/lib/python3.10/dist-packages/torchtext/vocab/__init__.py:4: UserWarning: 
/!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\ 
Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: `import torchtext; torchtext.disable_torchtext_deprecation_warning()`
  warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG)
/usr/local/lib/python3.10/dist-packages/torchtext/utils.py:4: UserWarning: 
/!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\ 
Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: `import torchtext; torchtext.disable_torchtext_deprecation_warning()`
  warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-6-95aeeff963a5>](https://localhost:8080/#) in <cell line: 9>()
      7 from torchtext.vocab import build_vocab_from_iterator
      8 
----> 9 train_iter = WikiText2(split='train')
     10 tokenizer = get_tokenizer('basic_english')
     11 vocab = build_vocab_from_iterator(map(tokenizer, train_iter), specials=["<unk>"])

2 frames
[/usr/local/lib/python3.10/dist-packages/torchtext/datasets/wikitext2.py](https://localhost:8080/#) in WikiText2(root, split)
     67     """
     68     if not is_module_available("torchdata"):
---> 69         raise ModuleNotFoundError(
     70             "Package `torchdata` not found. Please install following instructions at https://github.com/pytorch/data"
     71         )

ModuleNotFoundError: Package `torchdata` not found. Please install following instructions at https://github.com/pytorch/data

---------------------------------------------------------------------------

Describe your environment

Collab tutorial notebook

loganthomas commented 5 months ago

@pritamdamania87 how would you like to proceed?