A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.13k
stars
151
forks
source link
`v2.1.2+cu118` and `v2.1.1+cu118` run into torchdata `ImportError: libssl.so.3: cannot open shared object file: No such file or directory`, that `v2.1.0+cu118` doesn't have an issue with #1220
We are noticing a strange error specifically when using torch2.1.1+cu118 and torch2.1.2+cu118 , that is not an issue with torch2.1.0+cu118.
The error looks like this:
Traceback (most recent call last):
from ludwig.api import LudwigModel
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/api.py", line 41, in <module>
from ludwig.backend import Backend, initialize_backend, provision_preprocessing_workers
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/backend/__init__.py", line 22, in <module>
from ludwig.backend.base import Backend, LocalBackend
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/backend/base.py", line 34, in <module>
from ludwig.data.cache.manager import CacheManager
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/data/cache/manager.py", line 8, in <module>
from ludwig.data.dataset.base import DatasetManager
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/data/dataset/base.py", line 24, in <module>
from ludwig.distributed import DistributedStrategy
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/distributed/__init__.py", line 3, in <module>
from ludwig.distributed.base import DistributedStrategy, LocalStrategy
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/distributed/base.py", line 11, in <module>
from ludwig.modules.optimization_modules import create_optimizer
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/modules/optimization_modules.py", line 21, in <module>
from ludwig.utils.torch_utils import LudwigModule
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/utils/torch_utils.py", line 14, in <module>
from ludwig.utils.strings_utils import SpecialSymbol
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/utils/strings_utils.py", line 33, in <module>
from ludwig.utils.tokenizers import get_tokenizer_from_registry
File "/home/ray/anaconda3/lib/python3.8/site-packages/ludwig/utils/tokenizers.py", line 21, in <module>
import torchtext
File "/home/ray/anaconda3/lib/python3.8/site-packages/torchtext/__init__.py", line 12, in <module>
from . import data, datasets, prototype, functional, models, nn, transforms, utils, vocab, experimental
File "/home/ray/anaconda3/lib/python3.8/site-packages/torchtext/datasets/__init__.py", line 3, in <module>
from .ag_news import AG_NEWS
File "/home/ray/anaconda3/lib/python3.8/site-packages/torchtext/datasets/ag_news.py", line 5, in <module>
from torchdata.datapipes.iter import FileOpener, IterableWrapper
File "/home/ray/anaconda3/lib/python3.8/site-packages/torchdata/__init__.py", line 7, in <module>
from torchdata import _extension # noqa: F401
File "/home/ray/anaconda3/lib/python3.8/site-packages/torchdata/_extension.py", line 34, in <module>
_init_extension()
File "/home/ray/anaconda3/lib/python3.8/site-packages/torchdata/_extension.py", line 31, in _init_extension
from torchdata import _torchdata as _torchdata
ImportError: libssl.so.3: cannot open shared object file: No such file or directory
It seems like there's some complaint about torchdata, which seems to install with urllib3>2.0.
When trying to install with urllib3==1.26.16 to try to mitigate the libssl.so error, then we get a different error:
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1382, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/home/ray/anaconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/ray/anaconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 28, in <module>
from ..integrations.deepspeed import is_deepspeed_zero3_enabled
File "/home/ray/anaconda3/lib/python3.8/site-packages/transformers/integrations/deepspeed.py", line 49, in <module>
from accelerate.utils.deepspeed import HfDeepSpeedConfig as DeepSpeedConfig
File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/__init__.py", line 3, in <module>
from .accelerator import Accelerator
File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/accelerator.py", line 35, in <module>
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in <module>
from .utils import (
File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/utils/__init__.py", line 153, in <module>
from .launch import (
File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/utils/launch.py", line 24, in <module>
from ..commands.config.config_args import SageMakerConfig
File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/commands/config/__init__.py", line 19, in <module>
from .config import config_command_parser
File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/commands/config/config.py", line 25, in <module>
from .sagemaker import get_sagemaker_input
File "/home/ray/anaconda3/lib/python3.8/site-packages/accelerate/commands/config/sagemaker.py", line 35, in <module>
import boto3 # noqa: F401
File "/home/ray/anaconda3/lib/python3.8/site-packages/boto3/__init__.py", line 17, in <module>
from boto3.session import Session
File "/home/ray/anaconda3/lib/python3.8/site-packages/boto3/session.py", line 17, in <module>
import botocore.session
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/session.py", line 26, in <module>
import botocore.client
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/client.py", line 15, in <module>
from botocore import waiter, xform_name
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/waiter.py", line 18, in <module>
from botocore.docs.docstring import WaiterDocstring
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/__init__.py", line 15, in <module>
from botocore.docs.service import ServiceDocumenter
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/service.py", line 14, in <module>
from botocore.docs.client import ClientDocumenter, ClientExceptionsDocumenter
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/client.py", line 14, in <module>
from botocore.docs.example import ResponseExampleDocumenter
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/example.py", line 13, in <module>
from botocore.docs.shape import ShapeDocumenter
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/docs/shape.py", line 19, in <module>
from botocore.utils import is_json_value_header
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/utils.py", line 34, in <module>
import botocore.httpsession
File "/home/ray/anaconda3/lib/python3.8/site-packages/botocore/httpsession.py", line 21, in <module>
from urllib3.util.ssl_ import (
ImportError: cannot import name 'DEFAULT_CIPHERS' from 'urllib3.util.ssl_' (/home/ray/anaconda3/lib/python3.8/site-packages/urllib3/util/ssl_.py)
This suggests a different incompatibility (perhaps from deepspeed?).
Anyway, it seems like torch 2.1.0+cu118 doesn’t require the newest version of torchdata and/or it seems to work with urllib3==1.26.16, which appears to mitigate our issues.
However, the errors when trying to use 2.1.1+cu118 and 2.1.2+cu118 his seemed weird to me, so raising it here in case anyone had any helpful tidbits!
🐛 Describe the bug
We are noticing a strange error specifically when using torch2.1.1+cu118 and torch2.1.2+cu118 , that is not an issue with torch2.1.0+cu118.
The error looks like this:
It seems like there's some complaint about torchdata, which seems to install with urllib3>2.0.
When trying to install with
urllib3==1.26.16
to try to mitigate the libssl.so error, then we get a different error:This suggests a different incompatibility (perhaps from deepspeed?).
Anyway, it seems like torch
2.1.0+cu118
doesn’t require the newest version of torchdata and/or it seems to work withurllib3==1.26.16
, which appears to mitigate our issues.However, the errors when trying to use 2.1.1+cu118 and 2.1.2+cu118 his seemed weird to me, so raising it here in case anyone had any helpful tidbits!
Versions
2.1.0+cu118 (works) 2.1.1+cu118 (broken) 2.1.2+cu118 (broken)