pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch
https://pytorch.org/text
BSD 3-Clause "New" or "Revised" License
3.51k stars 810 forks source link

Cannot download IWSLT dataset #1091

Closed garyhlai closed 3 years ago

garyhlai commented 3 years ago

Running

train_data, valid_data, test_data = IWSLT.splits(exts = ('.de', '.en'), fields = (SRC, TRG))

on Google Colab leads to this error:


TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f9467921c18>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
MaxRetryError: HTTPSConnectionPool(host='wit3.fbk.eu', port=443): Max retries exceeded with url: /archive/2016-01//texts/de/en/de-en.tgz (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f9467921c18>: Failed to establish a new connection: [Errno 110] Connection timed out',))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    514                 raise SSLError(e, request=request)
    515 
--> 516             raise ConnectionError(e, request=request)
    517 
    518         except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='wit3.fbk.eu', port=443): Max retries exceeded with url: /archive/2016-01//texts/de/en/de-en.tgz (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f9467921c18>: Failed to establish a new connection: [Errno 110] Connection timed out',))
zhangguanheng66 commented 3 years ago

Looks like a failure of the third-party url

garyhlai commented 3 years ago

Any plan to switch to a new url, since this is currently making the torchtext IWSLT dataset unusable?

zhangguanheng66 commented 3 years ago

Any plan to switch to a new url, since this is currently making the torchtext IWSLT dataset unusable?

We could accept a PR that if you know a new url.

christopherhesse commented 3 years ago

Is pytorch unable to host datasets on first party URLs?

zhangguanheng66 commented 3 years ago

Unfortunately, no.

garyhlai commented 3 years ago

For those who need it, the dataset is up again on wit3.fbk.eu, specifically, it is hosted on:

https://drive.google.com/file/d/1l5y6Giag9aRPwGtuZHswh3w5v3qEz8D8/view

However, downloading a Google drive file from a Python program usually requires the Google drive API, which involves extremely inconvenient authorization with credentials.json etc. After some research, the best way to download the IWSLT dataset would be using these command lines:

pip install gdown
gdown https://drive.google.com/uc?id=1l5y6Giag9aRPwGtuZHswh3w5v3qEz8D8

Perhaps a warning message could be added to instruct the users to run these commands before they are able to proceed with the IWSLT.split. @zhangguanheng66 what do you think of this? Let me know and I'll tweak the current code to implement this.

zhangguanheng66 commented 3 years ago

@ghlai9665 thanks for the new url. Can you submit a PR and update the IWSLT dataset url with the new one? The new link will download all the files and we will need to fetch specific languages from the local. Let me know if you need any help there.

I used download_from_url func and it works:

url = 'https://drive.google.com/uc?id=1l5y6Giag9aRPwGtuZHswh3w5v3qEz8D8'
torchtext.utils.download_from_url(url)

Remember to add a CI test in test/data/test_builtin_datasets.py (similar to test_multi30k). We have an experimental IWSLT dataset (here).

garyhlai commented 3 years ago

@ghlai9665 thanks for the new url. Can you submit a PR and update the IWSLT dataset url with the new one? The new link will download all the files and we will need to fetch specific languages from the local. Let me know if you need any help there.

I used download_from_url func and it works:

url = 'https://drive.google.com/uc?id=1l5y6Giag9aRPwGtuZHswh3w5v3qEz8D8'
torchtext.utils.download_from_url(url)

Remember to add a CI test in test/data/test_builtin_datasets.py (similar to test_multi30k). We have an experimental IWSLT dataset (here).

Just saw this. Working on it!

garyhlai commented 3 years ago

I have made the changes and am writing the test now; however, running py.test at the root directory gave me the following error. Any guidance @zhangguanheng66 ?

============================ test session starts =============================
platform darwin -- Python 3.7.9, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/garylai/OffiCloud/torchtext, configfile: pytest.ini, testpaths: test/
plugins: pythonpath-0.7.3, cov-2.10.1
collected 5 items / 15 errors                                                

=================================== ERRORS ===================================
____________________ ERROR collecting test/test_build.py _____________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/test_build.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/test_build.py:7: in <module>
    import torchtext.data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
____________________ ERROR collecting test/test_utils.py _____________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/test_utils.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/test_utils.py:4: in <module>
    from torchtext import utils
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
____________________ ERROR collecting test/test_vocab.py _____________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/test_vocab.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/test_vocab.py:9: in <module>
    from torchtext import vocab
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
__________________ ERROR collecting test/data/test_batch.py __________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_batch.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_batch.py:2: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
____________ ERROR collecting test/data/test_builtin_datasets.py _____________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_builtin_datasets.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_builtin_datasets.py:4: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
_________________ ERROR collecting test/data/test_dataset.py _________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_dataset.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_dataset.py:2: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
__________________ ERROR collecting test/data/test_field.py __________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_field.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_field.py:6: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
_______________ ERROR collecting test/data/test_functional.py ________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_functional.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_functional.py:9: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
________________ ERROR collecting test/data/test_pipeline.py _________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_pipeline.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_pipeline.py:2: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
_________________ ERROR collecting test/data/test_subword.py _________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_subword.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_subword.py:5: in <module>
    from torchtext import data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
__________________ ERROR collecting test/data/test_utils.py __________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_utils.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_utils.py:3: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
___________ ERROR collecting test/experimental/test_transforms.py ____________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/experimental/test_transforms.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/experimental/test_transforms.py:4: in <module>
    from torchtext.experimental.transforms import (
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
_____________ ERROR collecting test/experimental/test_vectors.py _____________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/experimental/test_vectors.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/experimental/test_vectors.py:7: in <module>
    from torchtext.experimental.vectors import (
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
______________ ERROR collecting test/experimental/test_vocab.py ______________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/experimental/test_vocab.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/experimental/test_vocab.py:8: in <module>
    from torchtext.experimental.vocab import (
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
___________ ERROR collecting test/experimental/test_with_asset.py ____________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/experimental/test_with_asset.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/experimental/test_with_asset.py:2: in <module>
    import torchtext
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
========================== short test summary info ===========================
ERROR test/test_build.py
ERROR test/test_utils.py
ERROR test/test_vocab.py
ERROR test/data/test_batch.py
ERROR test/data/test_builtin_datasets.py
ERROR test/data/test_dataset.py
ERROR test/data/test_field.py
ERROR test/data/test_functional.py
ERROR test/data/test_pipeline.py
ERROR test/data/test_subword.py
ERROR test/data/test_utils.py
ERROR test/experimental/test_transforms.py
ERROR test/experimental/test_vectors.py
ERROR test/experimental/test_vocab.py
ERROR test/experimental/test_with_asset.py
!!!!!!!!!!!!!!!!!! Interrupted: 15 errors during collection !!!!!!!!!!!!!!!!!!
============================= 15 errors in 4.59s =============================
zhangguanheng66 commented 3 years ago

I have made the changes and am writing the test now; however, running py.test at the root directory gave me the following error. Any guidance @zhangguanheng66 ?

============================ test session starts =============================
platform darwin -- Python 3.7.9, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/garylai/OffiCloud/torchtext, configfile: pytest.ini, testpaths: test/
plugins: pythonpath-0.7.3, cov-2.10.1
collected 5 items / 15 errors                                                

=================================== ERRORS ===================================
____________________ ERROR collecting test/test_build.py _____________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/test_build.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/test_build.py:7: in <module>
    import torchtext.data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
____________________ ERROR collecting test/test_utils.py _____________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/test_utils.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/test_utils.py:4: in <module>
    from torchtext import utils
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
____________________ ERROR collecting test/test_vocab.py _____________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/test_vocab.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/test_vocab.py:9: in <module>
    from torchtext import vocab
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
__________________ ERROR collecting test/data/test_batch.py __________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_batch.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_batch.py:2: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
____________ ERROR collecting test/data/test_builtin_datasets.py _____________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_builtin_datasets.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_builtin_datasets.py:4: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
_________________ ERROR collecting test/data/test_dataset.py _________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_dataset.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_dataset.py:2: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
__________________ ERROR collecting test/data/test_field.py __________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_field.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_field.py:6: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
_______________ ERROR collecting test/data/test_functional.py ________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_functional.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_functional.py:9: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
________________ ERROR collecting test/data/test_pipeline.py _________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_pipeline.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_pipeline.py:2: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
_________________ ERROR collecting test/data/test_subword.py _________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_subword.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_subword.py:5: in <module>
    from torchtext import data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
__________________ ERROR collecting test/data/test_utils.py __________________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/data/test_utils.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/data/test_utils.py:3: in <module>
    import torchtext.data as data
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
___________ ERROR collecting test/experimental/test_transforms.py ____________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/experimental/test_transforms.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/experimental/test_transforms.py:4: in <module>
    from torchtext.experimental.transforms import (
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
_____________ ERROR collecting test/experimental/test_vectors.py _____________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/experimental/test_vectors.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/experimental/test_vectors.py:7: in <module>
    from torchtext.experimental.vectors import (
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
______________ ERROR collecting test/experimental/test_vocab.py ______________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/experimental/test_vocab.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/experimental/test_vocab.py:8: in <module>
    from torchtext.experimental.vocab import (
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
___________ ERROR collecting test/experimental/test_with_asset.py ____________
ImportError while importing test module '/Users/garylai/OffiCloud/torchtext/test/experimental/test_with_asset.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.pyenv/versions/3.7.9/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test/experimental/test_with_asset.py:2: in <module>
    import torchtext
torchtext/__init__.py:6: in <module>
    from . import experimental
torchtext/experimental/__init__.py:2: in <module>
    from . import transforms
torchtext/experimental/transforms.py:4: in <module>
    from torchtext._torchtext import RegexTokenizer as RegexTokenizerPybind
E   ModuleNotFoundError: No module named 'torchtext._torchtext'
========================== short test summary info ===========================
ERROR test/test_build.py
ERROR test/test_utils.py
ERROR test/test_vocab.py
ERROR test/data/test_batch.py
ERROR test/data/test_builtin_datasets.py
ERROR test/data/test_dataset.py
ERROR test/data/test_field.py
ERROR test/data/test_functional.py
ERROR test/data/test_pipeline.py
ERROR test/data/test_subword.py
ERROR test/data/test_utils.py
ERROR test/experimental/test_transforms.py
ERROR test/experimental/test_vectors.py
ERROR test/experimental/test_vocab.py
ERROR test/experimental/test_with_asset.py
!!!!!!!!!!!!!!!!!! Interrupted: 15 errors during collection !!!!!!!!!!!!!!!!!!
============================= 15 errors in 4.59s =============================

Could you try it not from root directory? From the root directory, it will import torchtext folder instead of the library.

garyhlai commented 3 years ago

============================= 15 errors in 4.59s =============================

Could you try it not from root directory? From the root directory, it will import torchtext folder instead of the library.

Which directory should I try it from? I tried to run it from the /test directory and got the same error.

zhangguanheng66 commented 3 years ago

============================= 15 errors in 4.59s =============================

Could you try it not from root directory? From the root directory, it will import torchtext folder instead of the library.

Which directory should I try it from? I tried to run it from the /test directory and got the same error.

It should work under /test directory. Anyway, you could submit a PR and it will automatically run the CI tests.

zhangguanheng66 commented 3 years ago

Fixed by https://github.com/pytorch/text/pull/1115