pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.5k stars 812 forks source link

Remove torchdata dependency from package and from CI #2241

Closed NicolasHug closed 6 months ago

NicolasHug commented 6 months ago

This PR removes the dependency on torchdata:

The files that still have references to torchdata after this PR are:

``` (base) ➜ text git:(remove_torchdata) ✗ git grep -i -l torchdata CONTRIBUTING_DATASETS.md README.rst docs/source/datasets.rst examples/torcharrow/README.md examples/tutorials/sst2_classification_non_distributed.py examples/tutorials/t5_demo.py test/torchtext_unittest/datasets/common.py test/torchtext_unittest/datasets/test_agnews.py test/torchtext_unittest/datasets/test_amazonreviews.py test/torchtext_unittest/datasets/test_cc100.py test/torchtext_unittest/datasets/test_cnndm.py test/torchtext_unittest/datasets/test_cola.py test/torchtext_unittest/datasets/test_conll2000chunking.py test/torchtext_unittest/datasets/test_dbpedia.py test/torchtext_unittest/datasets/test_enwik9.py test/torchtext_unittest/datasets/test_imdb.py test/torchtext_unittest/datasets/test_iwslt2016.py test/torchtext_unittest/datasets/test_iwslt2017.py test/torchtext_unittest/datasets/test_mnli.py test/torchtext_unittest/datasets/test_mrpc.py test/torchtext_unittest/datasets/test_multi30k.py test/torchtext_unittest/datasets/test_penntreebank.py test/torchtext_unittest/datasets/test_qnli.py test/torchtext_unittest/datasets/test_qqp.py test/torchtext_unittest/datasets/test_rte.py test/torchtext_unittest/datasets/test_sogounews.py test/torchtext_unittest/datasets/test_squads.py test/torchtext_unittest/datasets/test_sst2.py test/torchtext_unittest/datasets/test_stsb.py test/torchtext_unittest/datasets/test_udpos.py test/torchtext_unittest/datasets/test_wikitexts.py test/torchtext_unittest/datasets/test_wnli.py test/torchtext_unittest/datasets/test_yahooanswers.py test/torchtext_unittest/datasets/test_yelpreviews.py torchtext/datasets/ag_news.py torchtext/datasets/amazonreviewfull.py torchtext/datasets/amazonreviewpolarity.py torchtext/datasets/cc100.py torchtext/datasets/cnndm.py torchtext/datasets/cola.py torchtext/datasets/conll2000chunking.py torchtext/datasets/dbpedia.py torchtext/datasets/enwik9.py torchtext/datasets/imdb.py torchtext/datasets/iwslt2016.py torchtext/datasets/iwslt2017.py torchtext/datasets/mnli.py torchtext/datasets/mrpc.py torchtext/datasets/multi30k.py torchtext/datasets/penntreebank.py torchtext/datasets/qnli.py torchtext/datasets/qqp.py torchtext/datasets/rte.py torchtext/datasets/sogounews.py torchtext/datasets/squad1.py torchtext/datasets/squad2.py torchtext/datasets/sst2.py torchtext/datasets/stsb.py torchtext/datasets/udpos.py torchtext/datasets/wikitext103.py torchtext/datasets/wikitext2.py torchtext/datasets/wnli.py torchtext/datasets/yahooanswers.py torchtext/datasets/yelpreviewfull.py torchtext/datasets/yelpreviewpolarity.py ```

The torchtext.datasets namespace still relies on torchdata but that's OK: users who still need it can just install torchdata manually. Similarly I did not update/remove the tests/datasets/* files and instead just ignored all the dataset tests in pytest.ini.

I will be submitting follow-up PRs to also address the user-facing docs, raise proper warnings, etc.

pytorch-bot[bot] commented 6 months ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/text/2241

Note: Links to docs will display an error until the docs builds have been completed.

:x: 19 New Failures, 5 Unrelated Failures

As of commit 9e1377423452b704d695064f554fda480fbf5e67 with merge base 52c0d85c603bd8beff8c300661fcec09d673cec9 (image):

NEW FAILURES - The following jobs have failed:

* [Bandit / build](https://hud.pytorch.org/pr/pytorch/text/2241#22976816243) ([gh](https://github.com/pytorch/text/actions/runs/8389823752/job/22976816243)) `Process completed with exit code 1.` * [Build Linux Conda / pytorch/text / upload / conda-py3_8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22977083028) ([gh](https://github.com/pytorch/text/actions/runs/8389823859/job/22977083028)) `Unable to find any artifacts for the associated workflow` * [Build Linux Wheels / pytorch/text / upload / manywheel-py3_8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22977006273) ([gh](https://github.com/pytorch/text/actions/runs/8389823877/job/22977006273)) `Unable to find any artifacts for the associated workflow` * [Build M1 Wheels / pytorch/text / upload / wheel-py3_8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22976886897) ([gh](https://github.com/pytorch/text/actions/runs/8389823889/job/22976886897)) `Unable to find any artifacts for the associated workflow` * [Build Windows Conda / pytorch/text / upload / conda-py3_8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22977441587) ([gh](https://github.com/pytorch/text/actions/runs/8389823865/job/22977441587)) `Unable to find any artifacts for the associated workflow` * [Build Windows Wheels / pytorch/text / upload / wheel-py3_8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22977389376) ([gh](https://github.com/pytorch/text/actions/runs/8389823890/job/22977389376)) `Unable to find any artifacts for the associated workflow` * [CodeQL / build](https://hud.pytorch.org/pr/pytorch/text/2241#22976816276) ([gh](https://github.com/pytorch/text/actions/runs/8389823757/job/22976816276)) `ModuleNotFoundError: No module named 'torch'` * [cron / nightly / validate-binaries / linux-manywheel-3.8-rocm5.7 / linux-manywheel-3.8-rocm5.7](https://hud.pytorch.org/pr/pytorch/text/2241#22976826635) ([gh](https://github.com/pytorch/text/actions/runs/8389823894/job/22976826635)) `RuntimeError: Command docker exec -t 9cc9aee839ef2160e7a078c8721d6b56eddbe6f84613ec738bb0ca403017b592 /exec failed with exit code 1` * [cron / nightly / validate-binaries / linux-manywheel-3.8-rocm6.0 / linux-manywheel-3.8-rocm6.0](https://hud.pytorch.org/pr/pytorch/text/2241#22976826796) ([gh](https://github.com/pytorch/text/actions/runs/8389823894/job/22976826796)) `RuntimeError: Command docker exec -t b4c59739469ca3cd8d7a1ee09148c751eb6eed08af365234a9366f2767a2fdae /exec failed with exit code 1` * [cron / nightly / validate-binaries / macos-conda-3.8-cpu / macos-arm64-conda-3.8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22976828230) ([gh](https://github.com/pytorch/text/actions/runs/8389823894/job/22976828230)) `RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1` * [cron / nightly / validate-binaries / macos-conda-3.8-cpu / macos-conda-3.8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22976828594) ([gh](https://github.com/pytorch/text/actions/runs/8389823894/job/22976828594)) `RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1` * [Integration Test / tests (3.8) / linux-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976816901) ([gh](https://github.com/pytorch/text/actions/runs/8389823831/job/22976816901)) `RuntimeError: Command docker exec -t 92bd2fbf6c43ea828181aece0404e537bd15a9398ed866bb8626f491aa164c2f /exec failed with exit code 1` * [Unit-tests on Linux CPU / tests (3.10) / linux-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976817128) ([gh](https://github.com/pytorch/text/actions/runs/8389823821/job/22976817128)) `RuntimeError: Command docker exec -t 537a58035017a0bb75f9972a544337e74bfd93e344eb5006bbbac84bbf985d77 /exec failed with exit code 1` * [Unit-tests on Linux CPU / tests (3.8) / linux-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976817535) ([gh](https://github.com/pytorch/text/actions/runs/8389823821/job/22976817535)) `RuntimeError: Command docker exec -t 04f356f207df4729f1a6c0e19076f86b668e60ac3ca594331a1d2c6d3288ab31 /exec failed with exit code 1` * [Unit-tests on Linux CPU / tests (3.9) / linux-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976817820) ([gh](https://github.com/pytorch/text/actions/runs/8389823821/job/22976817820)) `RuntimeError: Command docker exec -t 5cd692a9adfbaed8e7289af58540821d2f156ba487f0889e73407872c3a6da46 /exec failed with exit code 1` * [Unit-tests on Linux GPU / tests (3.8, 11.7) / linux-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976816751) ([gh](https://github.com/pytorch/text/actions/runs/8389823813/job/22976816751)) `RuntimeError: Command docker exec -t f748028d3a04c4df1a10a38807c2f8264246395cb2e1e5d2c4588c413a5b1cf3 /exec failed with exit code 1` * [Unit-tests on Windows CPU / tests (3.10) / windows-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976816922) ([gh](https://github.com/pytorch/text/actions/runs/8389823846/job/22976816922)) `ImportError: DLL load failed while importing pywintypes: The specified module could not be found.` * [Unit-tests on Windows CPU / tests (3.8) / windows-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976817322) ([gh](https://github.com/pytorch/text/actions/runs/8389823846/job/22976817322)) `ImportError: DLL load failed while importing pywintypes: The specified module could not be found.` * [Unit-tests on Windows CPU / tests (3.9) / windows-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976817679) ([gh](https://github.com/pytorch/text/actions/runs/8389823846/job/22976817679)) `ImportError: DLL load failed while importing pywintypes: The specified module could not be found.`

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

* [Build M1 Conda / pytorch/text / upload / conda-py3_8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22976947807) ([gh](https://github.com/pytorch/text/actions/runs/8389823911/job/22976947807)) * [cron / nightly / validate-binaries / windows-conda-3.8-cpu / windows-conda-3.8-cpu](https://hud.pytorch.org/pr/pytorch/text/2241#22976825732) ([gh](https://github.com/pytorch/text/actions/runs/8389823894/job/22976825732)) `Process completed with exit code 1.` * [Unit-tests on Macos CPU / tests (3.10) / macos-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976817315) ([gh](https://github.com/pytorch/text/actions/runs/8389823830/job/22976817315)) ` thinc/backends/numpy_ops.cpp:5948:3: error: no member named 'use_tracing' in '_PyCFrame'` * [Unit-tests on Macos CPU / tests (3.8) / macos-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976817566) ([gh](https://github.com/pytorch/text/actions/runs/8389823830/job/22976817566)) ` thinc/backends/numpy_ops.cpp:5948:3: error: no member named 'use_tracing' in '_PyCFrame'` * [Unit-tests on Macos CPU / tests (3.9) / macos-job](https://hud.pytorch.org/pr/pytorch/text/2241#22976817830) ([gh](https://github.com/pytorch/text/actions/runs/8389823830/job/22976817830)) ` thinc/backends/numpy_ops.cpp:5948:3: error: no member named 'use_tracing' in '_PyCFrame'`

This comment was automatically generated by Dr. CI and updates every 15 minutes.