Pygmalion 2 - 13B Error

remos-06 commented 1 year ago

WARNING | modeling.inference_models.generic_hf_torch.class:_load:180 - Gave up on lazy loading due to Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback): libcudart.so.12: cannot open shared object file: No such file or directory ERROR | modeling.inference_models.hf_torch:_get_model:430 - Lazyloader failed, falling back to stock HF load. You may run out of RAM here. Details: ERROR | modeling.inference_models.hf_torch:_get_model:431 - Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback): libcudart.so.12: cannot open shared object file: No such file or directory ERROR | modeling.inference_models.hf_torch:_get_model:432 - Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1282, in _get_module return importlib.import_module("." + module_name, self.name) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 45, in from flash_attn import flash_attn_func, flash_attn_varlen_func File "/usr/local/lib/python3.10/dist-packages/flash_attn/init.py", line 3, in from flash_attn.flash_attn_interface import ( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 8, in import flash_attn_2_cuda as flash_attn_cuda ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/content/KoboldAI-Client/modeling/inference_models/hf_torch.py", line 420, in _get_model model = AutoModelForCausalLM.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/hf_bleeding_edge/init.py", line 50, in from_pretrained return AM.from_pretrained(path, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained model_class = _get_model_class(config, cls._model_mapping) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 387, in _get_model_class supported_models = model_mapping[type(config)] File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 739, in getitem return self._load_attr_from_module(model_type, model_name) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 753, in _load_attr_from_module return getattribute_from_module(self._modules[module_name], attr) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 697, in getattribute_from_module if hasattr(module, attr): File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1272, in getattr module = self._get_module(self._class_to_module[name]) File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1284, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback): libcudart.so.12: cannot open shared object file: No such file or directory

INFO | modeling.inference_models.hf_torch:_get_model:433 - Falling back to stock HF load... WARNING | modeling.inference_models.hf_torch:_get_model:466 - Fell back to GPT2LMHeadModel due to Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback): libcudart.so.12: cannot open shared object file: No such file or directory

Downloading (…)fetensors.index.json: 100% 29.9k/29.9k [00:00<00:00, 12.2MB/s] [aria2] Downloading model: 82%|########2 | 21.4G/26.0G [02:02<00:44, 105MB/s]Connection Attempt: 127.0.0.1 INFO | main:do_connect:2584 - Client connected! UI_2 [aria2] Downloading model: 100%|##########| 26.0G/26.0G [02:25<00:00, 179MB/s]You are using a model of type llama to instantiate a model of type gpt2. This is not supported for all configurations of models and can yield errors.

Downloading shards: 0% 0/3 [00:00<?, ?it/s] Downloading shards: 33% 1/3 [00:00<00:00, 5.09it/s] Downloading shards: 67% 2/3 [00:00<00:00, 5.08it/s] Downloading shards: 100% 3/3 [00:00<00:00, 4.92it/s]

I'm getting this error

henk717 commented 1 year ago

You are either not using play.sh or something is hijacking your dependencies. Or wait I see this is colab, on colab we don't support Pygmalion since its banned there so I can not test or replicate this without getting my account banned. I will double check with a different model.

henk717 commented 1 year ago

Can confirm this on Tiefighter, I think one of googles changes broke it but ill have to find out why.

remos-06 commented 1 year ago

2 days ago I was using Pygmalion without any issue

You are either not using play.sh or something is hijacking your dependencies. Or wait I see this is colab, on colab we don't support Pygmalion since its banned there so I can not test or replicate this without getting my account banned. I will double check with a different model.

remos-06 commented 1 year ago

I don't know what you did. it's fixed my issue

henk717 commented 1 year ago

This is indeed fixed, flash-attn silently replaced their package, I replaced it now. I still do not recommend PygmalionAI on Colab because of the ban risk, you can use Tiefighter instead.

henk717 / KoboldAI

Pygmalion 2 - 13B Error #481