Closed ihubanov closed 6 months ago
$ text-generation-launcher --env 1 ⨯ 2024-02-04T20:58:57.916637Z INFO text_generation_launcher: Runtime environment: Target: aarch64-unknown-linux-gnu Cargo version: 1.75.0 Commit sha: 0da00be52c9e591f8890ab07eea05cc15b9b127b Docker label: N/A nvidia-smi: Sun Feb 4 22:58:57 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 540.2.0 Driver Version: N/A CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Orin (nvgpu) N/A | N/A N/A | N/A | | N/A N/A N/A N/A / N/A | Not Supported | N/A N/A | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ 2024-02-04T20:58:57.916731Z INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "0.0.0.0", port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, env: true } 2024-02-04T20:58:57.917054Z INFO download: text_generation_launcher: Starting download process. 2024-02-04T20:59:03.931296Z ERROR download: text_generation_launcher: Download encountered an error: /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes warn(msg) /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /home/sniffski/.wasmedge/lib:/usr/local/cuda/lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/sniffski/Documents/text-generation-inference/server/text_generation_server/cli.py:124 in │ │ download_weights │ │ │ │ 121 │ ) │ │ 122 │ │ │ 123 │ # Import here after the logger is added to log potential import exceptions │ │ ❱ 124 │ from text_generation_server import utils │ │ 125 │ │ │ 126 │ # Test if files were already download │ │ 127 │ try: │ │ │ │ ╭────────────────── locals ───────────────────╮ │ │ │ auto_convert = True │ │ │ │ extension = '.safetensors' │ │ │ │ json_output = True │ │ │ │ logger_level = 'INFO' │ │ │ │ model_id = 'bigscience/bloom-560m' │ │ │ │ revision = None │ │ │ │ trust_remote_code = False │ │ │ ╰─────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/server/text_generation_server/utils/__init__. │ │ py:4 in <module> │ │ │ │ 1 from text_generation_server.utils.convert import convert_file, convert_files │ │ 2 from text_generation_server.utils.dist import initialize_torch_distributed │ │ 3 from text_generation_server.utils.weights import Weights │ │ ❱ 4 from text_generation_server.utils.peft import download_and_unload_peft │ │ 5 from text_generation_server.utils.hub import ( │ │ 6 │ weight_files, │ │ 7 │ weight_hub_files, │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ convert = <module 'text_generation_server.utils.convert' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/server/t… │ │ │ │ convert_file = <function convert_file at 0xfffebb835ea0> │ │ │ │ convert_files = <function convert_files at 0xfffebb835f30> │ │ │ │ dist = <module 'text_generation_server.utils.dist' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/server/t… │ │ │ │ initialize_torch_distributed = <function initialize_torch_distributed at 0xfffebb836170> │ │ │ │ log = <module 'text_generation_server.utils.log' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/server/t… │ │ │ │ weights = <module 'text_generation_server.utils.weights' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/server/t… │ │ │ │ Weights = <class 'text_generation_server.utils.weights.Weights'> │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/server/text_generation_server/utils/peft.py:7 │ │ in <module> │ │ │ │ 4 import torch │ │ 5 │ │ 6 from transformers import AutoTokenizer │ │ ❱ 7 from peft import AutoPeftModelForCausalLM, AutoPeftModelForSeq2SeqLM │ │ 8 │ │ 9 │ │ 10 def download_and_unload_peft(model_id, revision, trust_remote_code): │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ AutoTokenizer = <class 'transformers.models.auto.tokenization_auto.AutoTokenizer'> │ │ │ │ json = <module 'json' from '/usr/lib/python3.10/json/__init__.py'> │ │ │ │ logger = <loguru.logger handlers=[(id=1, level=20, sink=<stdout>)]> │ │ │ │ os = <module 'os' from '/usr/lib/python3.10/os.py'> │ │ │ │ torch = <module 'torch' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/sit… │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/__init │ │ __.py:22 in <module> │ │ │ │ 19 │ │ 20 __version__ = "0.4.0" │ │ 21 │ │ ❱ 22 from .auto import ( │ │ 23 │ AutoPeftModel, │ │ 24 │ AutoPeftModelForCausalLM, │ │ 25 │ AutoPeftModelForSequenceClassification, │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ import_utils = <module 'peft.import_utils' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site… │ │ │ │ utils = <module 'peft.utils' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site… │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/auto.p │ │ y:30 in <module> │ │ │ │ 27 │ AutoModelForTokenClassification, │ │ 28 ) │ │ 29 │ │ ❱ 30 from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING │ │ 31 from .peft_model import ( │ │ 32 │ PeftModel, │ │ 33 │ PeftModelForCausalLM, │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ annotations = _Feature((3, 7, 0, 'beta', 1), (3, 11, 0, 'alpha', 0), │ │ │ │ 16777216) │ │ │ │ AutoModel = <class │ │ │ │ 'transformers.models.auto.modeling_auto.AutoModel'> │ │ │ │ AutoModelForCausalLM = <class │ │ │ │ 'transformers.models.auto.modeling_auto.AutoModelForCa… │ │ │ │ AutoModelForQuestionAnswering = <class │ │ │ │ 'transformers.models.auto.modeling_auto.AutoModelForQu… │ │ │ │ AutoModelForSeq2SeqLM = <class │ │ │ │ 'transformers.models.auto.modeling_auto.AutoModelForSe… │ │ │ │ AutoModelForSequenceClassification = <class │ │ │ │ 'transformers.models.auto.modeling_auto.AutoModelForSe… │ │ │ │ AutoModelForTokenClassification = <class │ │ │ │ 'transformers.models.auto.modeling_auto.AutoModelForTo… │ │ │ │ importlib = <module 'importlib' from │ │ │ │ '/usr/lib/python3.10/importlib/__init__.py'> │ │ │ │ Optional = typing.Optional │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/mappin │ │ g.py:20 in <module> │ │ │ │ 17 │ │ 18 from typing import TYPE_CHECKING, Any, Dict │ │ 19 │ │ ❱ 20 from .peft_model import ( │ │ 21 │ PeftModel, │ │ 22 │ PeftModelForCausalLM, │ │ 23 │ PeftModelForFeatureExtraction, │ │ │ │ ╭───────────────────────────────────── locals ─────────────────────────────────────╮ │ │ │ annotations = _Feature((3, 7, 0, 'beta', 1), (3, 11, 0, 'alpha', 0), 16777216) │ │ │ │ Any = typing.Any │ │ │ │ Dict = typing.Dict │ │ │ │ TYPE_CHECKING = False │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/peft_m │ │ odel.py:39 in <module> │ │ │ │ 36 from transformers.utils import PushToHubMixin │ │ 37 │ │ 38 from . import __version__ │ │ ❱ 39 from .tuners import ( │ │ 40 │ AdaLoraModel, │ │ 41 │ AdaptionPromptModel, │ │ 42 │ IA3Model, │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ add_hook_to_module = <function add_hook_to_module at 0xfffeba25d000> │ │ │ │ AlignDevicesHook = <class 'accelerate.hooks.AlignDevicesHook'> │ │ │ │ annotations = _Feature((3, 7, 0, 'beta', 1), (3, 11, 0, 'alpha', 0), │ │ │ │ 16777216) │ │ │ │ Any = typing.Any │ │ │ │ BCEWithLogitsLoss = <class 'torch.nn.modules.loss.BCEWithLogitsLoss'> │ │ │ │ contextmanager = <function contextmanager at 0xffff85082560> │ │ │ │ CrossEntropyLoss = <class 'torch.nn.modules.loss.CrossEntropyLoss'> │ │ │ │ deepcopy = <function deepcopy at 0xffff84b4bf40> │ │ │ │ Dict = typing.Dict │ │ │ │ dispatch_model = <function dispatch_model at 0xfffeba25e290> │ │ │ │ EntryNotFoundError = <class 'huggingface_hub.utils._errors.EntryNotFoundError'> │ │ │ │ get_balanced_memory = <function get_balanced_memory at 0xfffeba24e9e0> │ │ │ │ hf_hub_download = <function hf_hub_download at 0xffff83792b90> │ │ │ │ infer_auto_device_map = <function infer_auto_device_map at 0xfffeba24eb90> │ │ │ │ inspect = <module 'inspect' from '/usr/lib/python3.10/inspect.py'> │ │ │ │ List = typing.List │ │ │ │ MSELoss = <class 'torch.nn.modules.loss.MSELoss'> │ │ │ │ Optional = typing.Optional │ │ │ │ os = <module 'os' from '/usr/lib/python3.10/os.py'> │ │ │ │ PreTrainedModel = <class 'transformers.modeling_utils.PreTrainedModel'> │ │ │ │ PushToHubMixin = <class 'transformers.utils.hub.PushToHubMixin'> │ │ │ │ QuestionAnsweringModelOutput = <class │ │ │ │ 'transformers.modeling_outputs.QuestionAnsweringModelOutput'> │ │ │ │ remove_hook_from_submodules = <function remove_hook_from_submodules at 0xfffeba25dab0> │ │ │ │ safe_load_file = <function load_file at 0xfffebb835b40> │ │ │ │ safe_save_file = <function save_file at 0xfffebb835ab0> │ │ │ │ SequenceClassifierOutput = <class │ │ │ │ 'transformers.modeling_outputs.SequenceClassifierOutput'> │ │ │ │ TokenClassifierOutput = <class 'transformers.modeling_outputs.TokenClassifierOutput'> │ │ │ │ torch = <module 'torch' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib… │ │ │ │ Union = typing.Union │ │ │ │ warnings = <module 'warnings' from '/usr/lib/python3.10/warnings.py'> │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/tuners │ │ /__init__.py:21 in <module> │ │ │ │ 18 # limitations under the License. │ │ 19 │ │ 20 from .adaption_prompt import AdaptionPromptConfig, AdaptionPromptModel │ │ ❱ 21 from .lora import LoraConfig, LoraModel │ │ 22 from .ia3 import IA3Config, IA3Model │ │ 23 from .adalora import AdaLoraConfig, AdaLoraModel │ │ 24 from .p_tuning import PromptEncoder, PromptEncoderConfig, PromptEncoderReparameterizatio │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ adaption_prompt = <module 'peft.tuners.adaption_prompt' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3… │ │ │ │ AdaptionPromptConfig = <class 'peft.tuners.adaption_prompt.AdaptionPromptConfig'> │ │ │ │ AdaptionPromptModel = <class 'peft.tuners.adaption_prompt.AdaptionPromptModel'> │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/peft/tuners │ │ /lora.py:42 in <module> │ │ │ │ 39 │ │ 40 │ │ 41 if is_bnb_available(): │ │ ❱ 42 │ import bitsandbytes as bnb │ │ 43 │ │ 44 │ │ 45 @dataclass │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ _freeze_adapter = <function _freeze_adapter at 0xfffeb70c7be0> │ │ │ │ _get_submodules = <function _get_submodules at 0xfffeb70c79a0> │ │ │ │ asdict = <function asdict at 0xffff8495dbd0> │ │ │ │ CLAMP_QUANTILE = 0.99 │ │ │ │ COMMON_LAYERS_PATTERN = ['layers', 'h', 'block', 'blocks', 'layer'] │ │ │ │ Conv1D = <class 'transformers.pytorch_utils.Conv1D'> │ │ │ │ dataclass = <function dataclass at 0xffff8495d990> │ │ │ │ Enum = <enum 'Enum'> │ │ │ │ F = <module 'torch.nn.functional' from │ │ │ │ '/home/sniffski/Documents/text-generation-inf… │ │ │ │ field = <function field at 0xffff8495c5e0> │ │ │ │ is_bnb_4bit_available = <function is_bnb_4bit_available at │ │ │ │ 0xfffeb70d6200> │ │ │ │ is_bnb_available = <function is_bnb_available at 0xfffeb70d6170> │ │ │ │ List = typing.List │ │ │ │ math = <module 'math' (built-in)> │ │ │ │ ModulesToSaveWrapper = <class │ │ │ │ 'peft.utils.other.ModulesToSaveWrapper'> │ │ │ │ nn = <module 'torch.nn' from │ │ │ │ '/home/sniffski/Documents/text-generation-inf… │ │ │ │ Optional = typing.Optional │ │ │ │ PeftConfig = <class 'peft.utils.config.PeftConfig'> │ │ │ │ PeftType = <enum 'PeftType'> │ │ │ │ re = <module 're' from '/usr/lib/python3.10/re.py'> │ │ │ │ replace = <function replace at 0xffff8495dea0> │ │ │ │ torch = <module 'torch' from │ │ │ │ '/home/sniffski/Documents/text-generation-inf… │ │ │ │ TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_M… { │ │ │ │ = │ 't5': ['q', 'v'], │ │ │ │ │ 'mt5': ['q', 'v'], │ │ │ │ │ 'bart': ['q_proj', 'v_proj'], │ │ │ │ │ 'gpt2': ['c_attn'], │ │ │ │ │ 'bloom': ['query_key_value'], │ │ │ │ │ 'blip-2': ['q', 'v', 'q_proj', 'v_proj'], │ │ │ │ │ 'opt': ['q_proj', 'v_proj'], │ │ │ │ │ 'gptj': ['q_proj', 'v_proj'], │ │ │ │ │ 'gpt_neox': ['query_key_value'], │ │ │ │ │ 'gpt_neo': ['q_proj', 'v_proj'], │ │ │ │ │ ... +11 │ │ │ │ } │ │ │ │ transpose = <function transpose at 0xfffeb70c7eb0> │ │ │ │ Tuple = typing.Tuple │ │ │ │ Union = typing.Union │ │ │ │ warnings = <module 'warnings' from │ │ │ │ '/usr/lib/python3.10/warnings.py'> │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │ │ s/__init__.py:6 in <module> │ │ │ │ 3 # This source code is licensed under the MIT license found in the │ │ 4 # LICENSE file in the root directory of this source tree. │ │ 5 │ │ ❱ 6 from . import cuda_setup, utils, research │ │ 7 from .autograd._functions import ( │ │ 8 │ MatmulLtState, │ │ 9 │ bmm_cublas, │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ cuda_setup = <module 'bitsandbytes.cuda_setup' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-p… │ │ │ │ utils = <module 'bitsandbytes.utils' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-p… │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │ │ s/research/__init__.py:1 in <module> │ │ │ │ ❱ 1 from . import nn │ │ 2 from .autograd._functions import ( │ │ 3 │ switchback_bnb, │ │ 4 │ matmul_fp8_global, │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │ │ s/research/nn/__init__.py:1 in <module> │ │ │ │ ❱ 1 from .modules import LinearFP8Mixed, LinearFP8Global │ │ 2 │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │ │ s/research/nn/modules.py:8 in <module> │ │ │ │ 5 from torch import Tensor, device, dtype, nn │ │ 6 │ │ 7 import bitsandbytes as bnb │ │ ❱ 8 from bitsandbytes.optim import GlobalOptimManager │ │ 9 from bitsandbytes.utils import OutlierTracer, find_outlier_dims │ │ 10 │ │ 11 T = TypeVar("T", bound="torch.nn.Module") │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ bnb = <module 'bitsandbytes' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pac… │ │ │ │ device = <class 'torch.device'> │ │ │ │ dtype = <class 'torch.dtype'> │ │ │ │ F = <module 'torch.nn.functional' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pac… │ │ │ │ nn = <module 'torch.nn' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pac… │ │ │ │ Optional = typing.Optional │ │ │ │ overload = <function overload at 0xffff84eef400> │ │ │ │ Tensor = <class 'torch.Tensor'> │ │ │ │ torch = <module 'torch' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pac… │ │ │ │ TypeVar = <class 'typing.TypeVar'> │ │ │ │ Union = typing.Union │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │ │ s/optim/__init__.py:6 in <module> │ │ │ │ 3 # This source code is licensed under the MIT license found in the │ │ 4 # LICENSE file in the root directory of this source tree. │ │ 5 │ │ ❱ 6 from bitsandbytes.cextension import COMPILED_WITH_CUDA │ │ 7 │ │ 8 from .adagrad import Adagrad, Adagrad8bit, Adagrad32bit │ │ 9 from .adam import Adam, Adam8bit, Adam32bit, PagedAdam, PagedAdam8bit, PagedAdam32bit │ │ │ │ /home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-packages/bitsandbyte │ │ s/cextension.py:20 in <module> │ │ │ │ 17 │ if lib is None and torch.cuda.is_available(): │ │ 18 │ │ CUDASetup.get_instance().generate_instructions() │ │ 19 │ │ CUDASetup.get_instance().print_log_stack() │ │ ❱ 20 │ │ raise RuntimeError(''' │ │ 21 │ │ CUDA Setup failed despite GPU being available. Please run the following command │ │ 22 │ │ │ │ 23 │ │ python -m bitsandbytes │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ ct = <module 'ctypes' from '/usr/lib/python3.10/ctypes/__init__.py'> │ │ │ │ CUDASetup = <class 'bitsandbytes.cuda_setup.main.CUDASetup'> │ │ │ │ lib = None │ │ │ │ os = <module 'os' from '/usr/lib/python3.10/os.py'> │ │ │ │ Path = <class 'pathlib.Path'> │ │ │ │ setup = <bitsandbytes.cuda_setup.main.CUDASetup object at 0xfffeb70e8250> │ │ │ │ torch = <module 'torch' from │ │ │ │ '/home/sniffski/Documents/text-generation-inference/venv/lib/python3.10/site-pa… │ │ │ │ warn = <built-in function warn> │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: CUDA Setup failed despite GPU being available. Please run the following command to get more information: python -m bitsandbytes Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues Error: DownloadError
As far as I understand bitsandbytes is used for quantization... Is it possible to disable it completely?
Steps to reproduce: Just compile on Jetson AGX Orin 64GB Devkit and try to run: text-generation-launcher --env
text-generation-launcher --env
Expecting to have option to disable the use of bitsandbytes some how, so I use the models without quantize at all...
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
System Info
As far as I understand bitsandbytes is used for quantization... Is it possible to disable it completely?
Information
Tasks
Reproduction
Steps to reproduce: Just compile on Jetson AGX Orin 64GB Devkit and try to run:
text-generation-launcher --env
Expected behavior
Expecting to have option to disable the use of bitsandbytes some how, so I use the models without quantize at all...