OSError: [WinError 126] The specified module could not be found. Error loading "\.venv\Lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies

Tisha-linkenite commented 4 months ago

Describe the bug

🐛 Describe the bug

I was working on fine tuning the llm, while executing the below piece of code encounter the [OSError] -

Error -

OSError: [WinError 126] The specified module could not be found. Error loading "\.venv\Lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies.

It's mainly about the [ fbgemm.dll ] package being missing from the [ torch library ]

Stack Trace -

{
    "name": "OSError",
    "message": "[WinError 126] The specified module could not be found. Error loading \\.venv\\Lib\\site-packages\\torch\\lib\\fbgemm.dll\" or one of its dependencies.",
    "stack": "---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[3], line 3
      1 import os
      2 from datasets import load_dataset, load_metric, Features, Value
----> 3 from transformers import AutoTokenizer, AutoModelForQuestionAnswering, TrainingArguments, Trainer
      5 # Load your dataset
      6 output_file = \"qa_dataset.json\"

\\.venv\\Lib\\site-packages\\transformers\\__init__.py:26
     23 from typing import TYPE_CHECKING
     25 # Check the dependencies satisfy the minimal versions required.
---> 26 from . import dependency_versions_check
     27 from .utils import (
     28     OptionalDependencyNotAvailable,
     29     _LazyModule,
   (...)
     48     logging,
     49 )
     52 logger = logging.get_logger(__name__)  # pylint: disable=invalid-name

\\.venv\\Lib\\site-packages\\transformers\\dependency_versions_check.py:16
      1 # Copyright 2020 The HuggingFace Team. All rights reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the \"License\");
   (...)
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
     15 from .dependency_versions_table import deps
---> 16 from .utils.versions import require_version, require_version_core
     19 # define which module versions we always want to check at run time
     20 # (usually the ones defined in `install_requires` in setup.py)
     21 #
     22 # order specific notes:
     23 # - tqdm must be checked before tokenizers
     25 pkgs_to_check_at_runtime = [
     26     \"python\",
     27     \"tqdm\",
   (...)
     37     \"pyyaml\",
     38 ]

\\.venv\\Lib\\site-packages\\transformers\\utils\\__init__.py:33
     24 from .constants import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD, IMAGENET_STANDARD_MEAN, IMAGENET_STANDARD_STD
     25 from .doc import (
     26     add_code_sample_docstrings,
     27     add_end_docstrings,
   (...)
     31     replace_return_docstrings,
     32 )
---> 33 from .generic import (
     34     ContextManagers,
     35     ExplicitEnum,
     36     ModelOutput,
     37     PaddingStrategy,
     38     TensorType,
     39     add_model_info_to_auto_map,
     40     cached_property,
     41     can_return_loss,
     42     expand_dims,
     43     find_labels,
     44     flatten_dict,
     45     infer_framework,
     46     is_jax_tensor,
     47     is_numpy_array,
     48     is_tensor,
     49     is_tf_symbolic_tensor,
     50     is_tf_tensor,
     51     is_torch_device,
     52     is_torch_dtype,
     53     is_torch_tensor,
     54     reshape,
     55     squeeze,
     56     strtobool,
     57     tensor_size,
     58     to_numpy,
     59     to_py_obj,
     60     transpose,
     61     working_or_temp_dir,
     62 )
     63 from .hub import (
     64     CLOUDFRONT_DISTRIB_PREFIX,
     65     HF_MODULES_CACHE,
   (...)
     91     try_to_load_from_cache,
     92 )
     93 from .import_utils import (
     94     ACCELERATE_MIN_VERSION,
     95     ENV_VARS_TRUE_AND_AUTO_VALUES,
   (...)
    210     torch_only_method,
    211 )

\\\.venv\\Lib\\site-packages\\transformers\\utils\\generic.py:461
    457         return tuple(self[k] for k in self.keys())
    460 if is_torch_available():
--> 461     import torch.utils._pytree as _torch_pytree
    463     def _model_output_flatten(output: ModelOutput) -> Tuple[List[Any], \"_torch_pytree.Context\"]:
    464         return list(output.values()), list(output.keys())

\\.venv\\Lib\\site-packages\\torch\\__init__.py:143
    141                 err = ctypes.WinError(ctypes.get_last_error())
    142                 err.strerror += f' Error loading \"{dll}\" or one of its dependencies.'
--> 143                 raise err
    145     kernel32.SetErrorMode(prev_error_mode)
    148 def _preload_cuda_deps(lib_folder, lib_name):

OSError: [WinError 126] The specified module could not be found. Error loading \"c:\\Users\\User\\Desktop\\Linkenite\\MarketingAI MVP\\.venv\\Lib\\site-packages\\torch\\lib\\fbgemm.dll\" or one of its dependencies."
}

Code -

import os
from datasets import load_dataset, load_metric, Features, Value
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, TrainingArguments, Trainer

# Load your dataset
output_file = "qa_dataset.json"

data_files = {"train": output_file}
# defining the features parameter based on the schema/structure of the QA dataset in json format
features = Features({
    "question": Value(dtype="string"),
    "answer": Value(dtype="string")
})

print(features)

dataset = load_dataset("json", data_files=data_files, features=features)

# Loading the pre-trained tokenizer and model
model_name = "bert-base-uncased"  # You can change this to another model if needed
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

# Tokenize the inputs
def preprocess_function(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=384,
        truncation="only_second",
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")
    sample_map = inputs.pop("overflow_to_sample_mapping")
    answers = examples["answers"]
    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        sample_index = sample_map[i]
        answer = answers[sample_index]
        start_char = answer["answer_start"][0]
        end_char = start_char + len(answer["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        context_start = sequence_ids.index(1)
        context_end = len(sequence_ids) - 1 - sequence_ids[::-1].index(1)

        if not (offset[context_start][0] <= start_char and offset[context_end][1] >= end_char):
            start_positions.append(0)
            end_positions.append(0)
        else:
            start_pos = [o[0] for o in offset].index(start_char)
            end_pos = [o[1] for o in offset].index(end_char)
            start_positions.append(start_pos)
            end_positions.append(end_pos)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    return inputs

tokenized_dataset = dataset.map(preprocess_function, batched=True, remove_columns=dataset["train"].column_names)

Versions

System Information

OS: Windows OS Version: 10.0.22631 Python Version: 3.12.4 (tags/v3.12.4:8e8a4ba, Jun 6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] Environment: Virtual Environment (.venv)

Package Information

langchain_core: 0.2.9 langchain: 0.2.5 langchain_community: 0.2.5 langsmith: 0.1.77 langchain_chroma: 0.1.1 langchain_openai: 0.1.8 langchain_text_splitters: 0.2.1

Versions

Versions System Information OS: Windows OS Version: 10.0.22631 Python Version: 3.12.4 (tags/v3.12.4:8e8a4ba, Jun 6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] Environment: Virtual Environment (.venv)

Package Information langchain_core: 0.2.9 langchain: 0.2.5 langchain_community: 0.2.5 langsmith: 0.1.77 langchain_chroma: 0.1.1 langchain_openai: 0.1.8 langchain_text_splitters: 0.2.1

ZailiWang commented 4 months ago

Hi Tisha, may I know what is your target device for the LLM finetuning workload? Is it an Intel Arc GPU? From this line in your traceback info

    148 def _preload_cuda_deps(lib_folder, lib_name):

I guess you installed a cuda version pytorch.

And have you installed IPEX? I ask as I didn't see import intel_extension_for_pytorch as ipex in your code.

Please check the install guide for how to utilize Intel GPU with IPEX environment.

Tisha-linkenite commented 4 months ago

Hi Zaili,

The LLM finetuning workload is Intel (R) UHD. I haven't installed IPEX, not sure which is the compatible version with torch Version: 2.3.1.

ZailiWang commented 4 months ago

Hi Tisha, UHD devices are not officially supported by ipex, so you can give it a try for your workload, but it would be hard to be supported if meet with crash, performance or accuracy issues.

If needed, please refer to the link in my last post for IPEX installation. Please note you need to install torch with the command in the installation tutorial, but not from official PyTorch site. The latest IPEX release version is 2.1.30+xpu, which works with a patched PyTorch 2.1 version.

Tisha-linkenite commented 4 months ago

Hi Zaili, when I try for version 2.1.20+xpu, I am facing the below error -

ERROR: Could not find a version that satisfies the requirement intel-extension-for-pytorch==2.1.30+xpu (from versions: none) ERROR: No matching distribution found for intel-extension-for-pytorch==2.1.30+xpu

Can I try the below command instead - pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

ZailiWang commented 4 months ago

Not sure if you followed the guide. The command for Windows should be

python -m pip install torch==2.1.0.post2 torchvision==0.16.0.post2 torchaudio==2.1.0.post2 intel-extension-for-pytorch==2.1.30.post0 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

Please also beware that driver and oneAPI basekit need to be installed in prior.

intel / intel-extension-for-pytorch