bitsandbytes does not provide arm64 library and text-generation-launcher fails to run on my Jetson AGX Orin 64 Devkit

System Info

$ text-generation-launcher --env                                                                                                        1 ⨯
2024-02-04T20:58:57.916637Z  INFO text_generation_launcher: Runtime environment:
Target: aarch64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: 0da00be52c9e591f8890ab07eea05cc15b9b127b
Docker label: N/A
nvidia-smi:
Sun Feb  4 22:58:57 2024       
   +---------------------------------------------------------------------------------------+
   | NVIDIA-SMI 540.2.0                Driver Version: N/A          CUDA Version: 12.2     |
   |-----------------------------------------+----------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
   |                                         |                      |               MIG M. |
   |=========================================+======================+======================|
   |   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
   | N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
   |                                         |                      |                  N/A |
   +-----------------------------------------+----------------------+----------------------+

   +---------------------------------------------------------------------------------------+
   | Processes:                                                                            |
   |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
   |        ID   ID                                                             Usage      |
   |=======================================================================================|
   |  No running processes found                                                           |
   +---------------------------------------------------------------------------------------+
2024-02-04T20:58:57.916731Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "0.0.0.0", port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, env: true }
2024-02-04T20:58:57.917054Z  INFO download: text_generation_launcher: Starting download process.
2024-02-04T20:59:03.931296Z ERROR download: text_generation_launcher: Download encountered an error: 
/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

  warn(msg)
/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /home/sniffski/.wasmedge/lib:/usr/local/cuda/lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/sniffski/Documents/text-generation-inference/server/text_generation_server/cli.py:124 in   │
│ download_weights                                                                                 │
│                                                                                                  │
│   121 │   )                                                                                      │
│   122 │                                                                                          │
│   123 │   # Import here after the logger is added to log potential import exceptions             │
│ ❱ 124 │   from text_generation_server import utils                                               │
│   125 │                                                                                          │
│   126 │   # Test if files were already download                                                  │
│   127 │   try:                                                                                   │
│                                                                                                  │
│ ╭────────────────── locals ───────────────────╮                                                  │
│ │      auto_convert = True                    │                                                  │
│ │         extension = '.safetensors'          │                                                  │
│ │       json_output = True                    │                                                  │
│ │      logger_level = 'INFO'                  │                                                  │
│ │          model_id = 'bigscience/bloom-560m' │                                                  │
│ │          revision = None                    │                                                  │
│ │ trust_remote_code = False                   │                                                  │
│ ╰─────────────────────────────────────────────╯                                                  │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/server/text_generation_server/utils/__init__. │
│ py:4 in <module>                                                                                 │
│                                                                                                  │
│    1 from text_generation_server.utils.convert import convert_file, convert_files                │
│    2 from text_generation_server.utils.dist import initialize_torch_distributed                  │
│    3 from text_generation_server.utils.weights import Weights                                    │
│ ❱  4 from text_generation_server.utils.peft import download_and_unload_peft                      │
│    5 from text_generation_server.utils.hub import (                                              │
│    6 │   weight_files,                                                                           │
│    7 │   weight_hub_files,                                                                       │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                      convert = <module 'text_generation_server.utils.convert' from           │ │
│ │                                '/home/sniffski/Documents/text-generation-inference/server/t… │ │
│ │                 convert_file = <function convert_file at 0xfffebb835ea0>                     │ │
│ │                convert_files = <function convert_files at 0xfffebb835f30>                    │ │
│ │                         dist = <module 'text_generation_server.utils.dist' from              │ │
│ │                                '/home/sniffski/Documents/text-generation-inference/server/t… │ │
│ │ initialize_torch_distributed = <function initialize_torch_distributed at 0xfffebb836170>     │ │
│ │                          log = <module 'text_generation_server.utils.log' from               │ │
│ │                                '/home/sniffski/Documents/text-generation-inference/server/t… │ │
│ │                      weights = <module 'text_generation_server.utils.weights' from           │ │
│ │                                '/home/sniffski/Documents/text-generation-inference/server/t… │ │
│ │                      Weights = <class 'text_generation_server.utils.weights.Weights'>        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/server/text_generation_server/utils/peft.py:7 │
│ in <module>                                                                                      │
│                                                                                                  │
│    4 import torch                                                                                │
│    5                                                                                             │
│    6 from transformers import AutoTokenizer                                                      │
│ ❱  7 from peft import AutoPeftModelForCausalLM, AutoPeftModelForSeq2SeqLM                        │
│    8                                                                                             │
│    9                                                                                             │
│   10 def download_and_unload_peft(model_id, revision, trust_remote_code):                        │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ AutoTokenizer = <class 'transformers.models.auto.tokenization_auto.AutoTokenizer'>           │ │
│ │          json = <module 'json' from '/usr/lib/python3.10/json/__init__.py'>                  │ │
│ │        logger = <loguru.logger handlers=[(id=1, level=20, sink=<stdout>)]>                   │ │
│ │            os = <module 'os' from '/usr/lib/python3.10/os.py'>                               │ │
│ │         torch = <module 'torch' from                                                         │ │
│ │                 '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/sit… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/__init │
│ __.py:22 in <module>                                                                             │
│                                                                                                  │
│   19                                                                                             │
│   20 __version__ = "0.4.0"                                                                       │
│   21                                                                                             │
│ ❱ 22 from .auto import (                                                                         │
│   23 │   AutoPeftModel,                                                                          │
│   24 │   AutoPeftModelForCausalLM,                                                               │
│   25 │   AutoPeftModelForSequenceClassification,                                                 │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ import_utils = <module 'peft.import_utils' from                                              │ │
│ │                '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site… │ │
│ │        utils = <module 'peft.utils' from                                                     │ │
│ │                '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/auto.p │
│ y:30 in <module>                                                                                 │
│                                                                                                  │
│    27 │   AutoModelForTokenClassification,                                                       │
│    28 )                                                                                          │
│    29                                                                                            │
│ ❱  30 from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING                                      │
│    31 from .peft_model import (                                                                  │
│    32 │   PeftModel,                                                                             │
│    33 │   PeftModelForCausalLM,                                                                  │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                        annotations = _Feature((3, 7, 0, 'beta', 1), (3, 11, 0, 'alpha', 0),  │ │
│ │                                      16777216)                                               │ │
│ │                          AutoModel = <class                                                  │ │
│ │                                      'transformers.models.auto.modeling_auto.AutoModel'>     │ │
│ │               AutoModelForCausalLM = <class                                                  │ │
│ │                                      'transformers.models.auto.modeling_auto.AutoModelForCa… │ │
│ │      AutoModelForQuestionAnswering = <class                                                  │ │
│ │                                      'transformers.models.auto.modeling_auto.AutoModelForQu… │ │
│ │              AutoModelForSeq2SeqLM = <class                                                  │ │
│ │                                      'transformers.models.auto.modeling_auto.AutoModelForSe… │ │
│ │ AutoModelForSequenceClassification = <class                                                  │ │
│ │                                      'transformers.models.auto.modeling_auto.AutoModelForSe… │ │
│ │    AutoModelForTokenClassification = <class                                                  │ │
│ │                                      'transformers.models.auto.modeling_auto.AutoModelForTo… │ │
│ │                          importlib = <module 'importlib' from                                │ │
│ │                                      '/usr/lib/python3.10/importlib/__init__.py'>            │ │
│ │                           Optional = typing.Optional                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/mappin │
│ g.py:20 in <module>                                                                              │
│                                                                                                  │
│   17                                                                                             │
│   18 from typing import TYPE_CHECKING, Any, Dict                                                 │
│   19                                                                                             │
│ ❱ 20 from .peft_model import (                                                                   │
│   21 │   PeftModel,                                                                              │
│   22 │   PeftModelForCausalLM,                                                                   │
│   23 │   PeftModelForFeatureExtraction,                                                          │
│                                                                                                  │
│ ╭───────────────────────────────────── locals ─────────────────────────────────────╮             │
│ │   annotations = _Feature((3, 7, 0, 'beta', 1), (3, 11, 0, 'alpha', 0), 16777216) │             │
│ │           Any = typing.Any                                                       │             │
│ │          Dict = typing.Dict                                                      │             │
│ │ TYPE_CHECKING = False                                                            │             │
│ ╰──────────────────────────────────────────────────────────────────────────────────╯             │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/peft_m │
│ odel.py:39 in <module>                                                                           │
│                                                                                                  │
│     36 from transformers.utils import PushToHubMixin                                             │
│     37                                                                                           │
│     38 from . import __version__                                                                 │
│ ❱   39 from .tuners import (                                                                     │
│     40 │   AdaLoraModel,                                                                         │
│     41 │   AdaptionPromptModel,                                                                  │
│     42 │   IA3Model,                                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │           add_hook_to_module = <function add_hook_to_module at 0xfffeba25d000>               │ │
│ │             AlignDevicesHook = <class 'accelerate.hooks.AlignDevicesHook'>                   │ │
│ │                  annotations = _Feature((3, 7, 0, 'beta', 1), (3, 11, 0, 'alpha', 0),        │ │
│ │                                16777216)                                                     │ │
│ │                          Any = typing.Any                                                    │ │
│ │            BCEWithLogitsLoss = <class 'torch.nn.modules.loss.BCEWithLogitsLoss'>             │ │
│ │               contextmanager = <function contextmanager at 0xffff85082560>                   │ │
│ │             CrossEntropyLoss = <class 'torch.nn.modules.loss.CrossEntropyLoss'>              │ │
│ │                     deepcopy = <function deepcopy at 0xffff84b4bf40>                         │ │
│ │                         Dict = typing.Dict                                                   │ │
│ │               dispatch_model = <function dispatch_model at 0xfffeba25e290>                   │ │
│ │           EntryNotFoundError = <class 'huggingface_hub.utils._errors.EntryNotFoundError'>    │ │
│ │          get_balanced_memory = <function get_balanced_memory at 0xfffeba24e9e0>              │ │
│ │              hf_hub_download = <function hf_hub_download at 0xffff83792b90>                  │ │
│ │        infer_auto_device_map = <function infer_auto_device_map at 0xfffeba24eb90>            │ │
│ │                      inspect = <module 'inspect' from '/usr/lib/python3.10/inspect.py'>      │ │
│ │                         List = typing.List                                                   │ │
│ │                      MSELoss = <class 'torch.nn.modules.loss.MSELoss'>                       │ │
│ │                     Optional = typing.Optional                                               │ │
│ │                           os = <module 'os' from '/usr/lib/python3.10/os.py'>                │ │
│ │              PreTrainedModel = <class 'transformers.modeling_utils.PreTrainedModel'>         │ │
│ │               PushToHubMixin = <class 'transformers.utils.hub.PushToHubMixin'>               │ │
│ │ QuestionAnsweringModelOutput = <class                                                        │ │
│ │                                'transformers.modeling_outputs.QuestionAnsweringModelOutput'> │ │
│ │  remove_hook_from_submodules = <function remove_hook_from_submodules at 0xfffeba25dab0>      │ │
│ │               safe_load_file = <function load_file at 0xfffebb835b40>                        │ │
│ │               safe_save_file = <function save_file at 0xfffebb835ab0>                        │ │
│ │     SequenceClassifierOutput = <class                                                        │ │
│ │                                'transformers.modeling_outputs.SequenceClassifierOutput'>     │ │
│ │        TokenClassifierOutput = <class 'transformers.modeling_outputs.TokenClassifierOutput'> │ │
│ │                        torch = <module 'torch' from                                          │ │
│ │                                '/home/sniffski/Documents/text-generation-inference/venv/lib… │ │
│ │                        Union = typing.Union                                                  │ │
│ │                     warnings = <module 'warnings' from '/usr/lib/python3.10/warnings.py'>    │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/tuners │
│ /__init__.py:21 in <module>                                                                      │
│                                                                                                  │
│   18 # limitations under the License.                                                            │
│   19                                                                                             │
│   20 from .adaption_prompt import AdaptionPromptConfig, AdaptionPromptModel                      │
│ ❱ 21 from .lora import LoraConfig, LoraModel                                                     │
│   22 from .ia3 import IA3Config, IA3Model                                                        │
│   23 from .adalora import AdaLoraConfig, AdaLoraModel                                            │
│   24 from .p_tuning import PromptEncoder, PromptEncoderConfig, PromptEncoderReparameterizatio    │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │      adaption_prompt = <module 'peft.tuners.adaption_prompt' from                            │ │
│ │                        '/home/sniffski/Documents/text-generation-inference/venv/lib/python3… │ │
│ │ AdaptionPromptConfig = <class 'peft.tuners.adaption_prompt.AdaptionPromptConfig'>            │ │
│ │  AdaptionPromptModel = <class 'peft.tuners.adaption_prompt.AdaptionPromptModel'>             │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/tuners │
│ /lora.py:42 in <module>                                                                          │
│                                                                                                  │
│     39                                                                                           │
│     40                                                                                           │
│     41 if is_bnb_available():                                                                    │
│ ❱   42 │   import bitsandbytes as bnb                                                            │
│     43                                                                                           │
│     44                                                                                           │
│     45 @dataclass                                                                                │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                             _freeze_adapter = <function _freeze_adapter at 0xfffeb70c7be0>   │ │
│ │                             _get_submodules = <function _get_submodules at 0xfffeb70c79a0>   │ │
│ │                                      asdict = <function asdict at 0xffff8495dbd0>            │ │
│ │                              CLAMP_QUANTILE = 0.99                                           │ │
│ │                       COMMON_LAYERS_PATTERN = ['layers', 'h', 'block', 'blocks', 'layer']    │ │
│ │                                      Conv1D = <class 'transformers.pytorch_utils.Conv1D'>    │ │
│ │                                   dataclass = <function dataclass at 0xffff8495d990>         │ │
│ │                                        Enum = <enum 'Enum'>                                  │ │
│ │                                           F = <module 'torch.nn.functional' from             │ │
│ │                                               '/home/sniffski/Documents/text-generation-inf… │ │
│ │                                       field = <function field at 0xffff8495c5e0>             │ │
│ │                       is_bnb_4bit_available = <function is_bnb_4bit_available at             │ │
│ │                                               0xfffeb70d6200>                                │ │
│ │                            is_bnb_available = <function is_bnb_available at 0xfffeb70d6170>  │ │
│ │                                        List = typing.List                                    │ │
│ │                                        math = <module 'math' (built-in)>                     │ │
│ │                        ModulesToSaveWrapper = <class                                         │ │
│ │                                               'peft.utils.other.ModulesToSaveWrapper'>       │ │
│ │                                          nn = <module 'torch.nn' from                        │ │
│ │                                               '/home/sniffski/Documents/text-generation-inf… │ │
│ │                                    Optional = typing.Optional                                │ │
│ │                                  PeftConfig = <class 'peft.utils.config.PeftConfig'>         │ │
│ │                                    PeftType = <enum 'PeftType'>                              │ │
│ │                                          re = <module 're' from '/usr/lib/python3.10/re.py'> │ │
│ │                                     replace = <function replace at 0xffff8495dea0>           │ │
│ │                                       torch = <module 'torch' from                           │ │
│ │                                               '/home/sniffski/Documents/text-generation-inf… │ │
│ │ TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_M… {                                              │ │
│ │                                             = │   't5': ['q', 'v'],                          │ │
│ │                                               │   'mt5': ['q', 'v'],                         │ │
│ │                                               │   'bart': ['q_proj', 'v_proj'],              │ │
│ │                                               │   'gpt2': ['c_attn'],                        │ │
│ │                                               │   'bloom': ['query_key_value'],              │ │
│ │                                               │   'blip-2': ['q', 'v', 'q_proj', 'v_proj'],  │ │
│ │                                               │   'opt': ['q_proj', 'v_proj'],               │ │
│ │                                               │   'gptj': ['q_proj', 'v_proj'],              │ │
│ │                                               │   'gpt_neox': ['query_key_value'],           │ │
│ │                                               │   'gpt_neo': ['q_proj', 'v_proj'],           │ │
│ │                                               │   ... +11                                    │ │
│ │                                               }                                              │ │
│ │                                   transpose = <function transpose at 0xfffeb70c7eb0>         │ │
│ │                                       Tuple = typing.Tuple                                   │ │
│ │                                       Union = typing.Union                                   │ │
│ │                                    warnings = <module 'warnings' from                        │ │
│ │                                               '/usr/lib/python3.10/warnings.py'>             │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │
│ s/__init__.py:6 in <module>                                                                      │
│                                                                                                  │
│    3 # This source code is licensed under the MIT license found in the                           │
│    4 # LICENSE file in the root directory of this source tree.                                   │
│    5                                                                                             │
│ ❱  6 from . import cuda_setup, utils, research                                                   │
│    7 from .autograd._functions import (                                                          │
│    8 │   MatmulLtState,                                                                          │
│    9 │   bmm_cublas,                                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ cuda_setup = <module 'bitsandbytes.cuda_setup' from                                          │ │
│ │              '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-p… │ │
│ │      utils = <module 'bitsandbytes.utils' from                                               │ │
│ │              '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-p… │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │
│ s/research/__init__.py:1 in <module>                                                             │
│                                                                                                  │
│ ❱ 1 from . import nn                                                                             │
│   2 from .autograd._functions import (                                                           │
│   3 │   switchback_bnb,                                                                          │
│   4 │   matmul_fp8_global,                                                                       │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │
│ s/research/nn/__init__.py:1 in <module>                                                          │
│                                                                                                  │
│ ❱ 1 from .modules import LinearFP8Mixed, LinearFP8Global                                         │
│   2                                                                                              │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │
│ s/research/nn/modules.py:8 in <module>                                                           │
│                                                                                                  │
│    5 from torch import Tensor, device, dtype, nn                                                 │
│    6                                                                                             │
│    7 import bitsandbytes as bnb                                                                  │
│ ❱  8 from bitsandbytes.optim import GlobalOptimManager                                           │
│    9 from bitsandbytes.utils import OutlierTracer, find_outlier_dims                             │
│   10                                                                                             │
│   11 T = TypeVar("T", bound="torch.nn.Module")                                                   │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │      bnb = <module 'bitsandbytes' from                                                       │ │
│ │            '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pac… │ │
│ │   device = <class 'torch.device'>                                                            │ │
│ │    dtype = <class 'torch.dtype'>                                                             │ │
│ │        F = <module 'torch.nn.functional' from                                                │ │
│ │            '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pac… │ │
│ │       nn = <module 'torch.nn' from                                                           │ │
│ │            '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pac… │ │
│ │ Optional = typing.Optional                                                                   │ │
│ │ overload = <function overload at 0xffff84eef400>                                             │ │
│ │   Tensor = <class 'torch.Tensor'>                                                            │ │
│ │    torch = <module 'torch' from                                                              │ │
│ │            '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pac… │ │
│ │  TypeVar = <class 'typing.TypeVar'>                                                          │ │
│ │    Union = typing.Union                                                                      │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │
│ s/optim/__init__.py:6 in <module>                                                                │
│                                                                                                  │
│    3 # This source code is licensed under the MIT license found in the                           │
│    4 # LICENSE file in the root directory of this source tree.                                   │
│    5                                                                                             │
│ ❱  6 from bitsandbytes.cextension import COMPILED_WITH_CUDA                                      │
│    7                                                                                             │
│    8 from .adagrad import Adagrad, Adagrad8bit, Adagrad32bit                                     │
│    9 from .adam import Adam, Adam8bit, Adam32bit, PagedAdam, PagedAdam8bit, PagedAdam32bit       │
│                                                                                                  │
│ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │
│ s/cextension.py:20 in <module>                                                                   │
│                                                                                                  │
│   17 │   if lib is None and torch.cuda.is_available():                                           │
│   18 │   │   CUDASetup.get_instance().generate_instructions()                                    │
│   19 │   │   CUDASetup.get_instance().print_log_stack()                                          │
│ ❱ 20 │   │   raise RuntimeError('''                                                              │
│   21 │   │   CUDA Setup failed despite GPU being available. Please run the following command     │
│   22 │   │                                                                                       │
│   23 │   │   python -m bitsandbytes                                                              │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │        ct = <module 'ctypes' from '/usr/lib/python3.10/ctypes/__init__.py'>                  │ │
│ │ CUDASetup = <class 'bitsandbytes.cuda_setup.main.CUDASetup'>                                 │ │
│ │       lib = None                                                                             │ │
│ │        os = <module 'os' from '/usr/lib/python3.10/os.py'>                                   │ │
│ │      Path = <class 'pathlib.Path'>                                                           │ │
│ │     setup = <bitsandbytes.cuda_setup.main.CUDASetup object at 0xfffeb70e8250>                │ │
│ │     torch = <module 'torch' from                                                             │ │
│ │             '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pa… │ │
│ │      warn = <built-in function warn>                                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
Error: DownloadError
As far as I understand bitsandbytes is used for quantization... Is it possible to disable it completely?
Information

[ ] Docker
[ ] The CLI directly
Tasks

[ ] An officially supported command
[ ] My own modifications
Reproduction

Steps to reproduce: Just compile on Jetson AGX Orin 64GB Devkit and try to run: text-generation-launcher --env
Expected behavior

Expecting to have option to disable the use of bitsandbytes some how, so I use the models without quantize at all...
huggingface / text-generation-inference

bitsandbytes does not provide arm64 library and text-generation-launcher fails to run on my Jetson AGX Orin 64 Devkit #1529

System Info

Information

Tasks

Reproduction

Expected behavior