The training process uses Wikitext-2 dataset from torchtext which is no longer supported. In #2895, the proposed solution was to use hugging face's version of wikitext2.
/usr/local/lib/python3.10/dist-packages/torchtext/datasets/__init__.py:4: UserWarning:
/!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\
Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: `import torchtext; torchtext.disable_torchtext_deprecation_warning()`
warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG)
/usr/local/lib/python3.10/dist-packages/torchtext/data/__init__.py:4: UserWarning:
/!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\
Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: `import torchtext; torchtext.disable_torchtext_deprecation_warning()`
warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG)
/usr/local/lib/python3.10/dist-packages/torchtext/vocab/__init__.py:4: UserWarning:
/!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\
Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: `import torchtext; torchtext.disable_torchtext_deprecation_warning()`
warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG)
/usr/local/lib/python3.10/dist-packages/torchtext/utils.py:4: UserWarning:
/!\ IMPORTANT WARNING ABOUT TORCHTEXT STATUS /!\
Torchtext is deprecated and the last released version will be 0.18 (this one). You can silence this warning by calling the following at the beginnign of your scripts: `import torchtext; torchtext.disable_torchtext_deprecation_warning()`
warnings.warn(torchtext._TORCHTEXT_DEPRECATION_MSG)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
[<ipython-input-6-95aeeff963a5>](https://localhost:8080/#) in <cell line: 9>()
7 from torchtext.vocab import build_vocab_from_iterator
8
----> 9 train_iter = WikiText2(split='train')
10 tokenizer = get_tokenizer('basic_english')
11 vocab = build_vocab_from_iterator(map(tokenizer, train_iter), specials=["<unk>"])
2 frames
[/usr/local/lib/python3.10/dist-packages/torchtext/datasets/wikitext2.py](https://localhost:8080/#) in WikiText2(root, split)
67 """
68 if not is_module_available("torchdata"):
---> 69 raise ModuleNotFoundError(
70 "Package `torchdata` not found. Please install following instructions at https://github.com/pytorch/data"
71 )
ModuleNotFoundError: Package `torchdata` not found. Please install following instructions at https://github.com/pytorch/data
---------------------------------------------------------------------------
Add Link
@pritamdamania87 for awareness
Describe the bug
Related to #2895 and #2910
The training process uses Wikitext-2 dataset from
torchtext
which is no longer supported. In #2895, the proposed solution was to use hugging face's version of wikitext2.Describe your environment
Collab tutorial notebook