chatglm2一直报CUDA out of memory

serfan commented 1 year ago

老版本chatglm-6b-int4没问题，可以正常跑，改用chatglm2-6b-int4之后，一直报错CUDA out of memory，更新了最新的闻达也不行，报错信息如下，笔记本3060显卡只有6G显存，难道没机会体验chatglm2了？：（ Exception in thread Thread-1 (load_model): Traceback (most recent call last): File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\threading.py", line 1038, in _bootstrap_inner self.run() File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\threading.py", line 975, in run self._target(*self._args, self._kwargs) File "E:\ai\wenda\wenda\wenda.py", line 51, in load_model LLM.load_model() File "E:\ai\wenda\wenda\llms\llm_glm6b.py", line 71, in load_model model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, device=device, revision="v1.1.0") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\transformers\models\auto\auto_factory.py", line 479, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\transformers\modeling_utils.py", line 2675, in from_pretrained model = cls(config, model_args, model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\serfa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 767, in init self.transformer = ChatGLMModel(config, empty_init=empty_init, device=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\serfa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 700, in init self.encoder = init_method(GLMTransformer, config, init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\utils\init.py", line 52, in skip_init return module_cls(args, kwargs).to_empty(device=final_device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 1024, in to_empty return self._apply(lambda t: torch.empty_like(t, device=device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) [Previous line repeated 1 more time] File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 820, in _apply param_applied = fn(param) ^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 1024, in return self._apply(lambda t: torch.empty_like(t, device=device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch_refs__init__.py", line 4254, in empty_like return torch.empty_strided( ^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB (GPU 0; 6.00 GiB total capacity; 5.34 GiB already allocated; 0 bytes free; 5.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

stephsix commented 1 year ago

+1 10G的3080也报错了

serfan commented 1 year ago

在chatglm2的官方github上面找到了解决方案，有遇到跟我一样的情况可以尝试一下。修改wenda\llms\llm_glm6b.py,找到

 if "chatglm2" in settings.llm.path:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, device=device, revision="v1.1.0")
    else:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0")

修改为：

if "chatglm2" in settings.llm.path:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True,revision="v1.1.0").cuda()
    else:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0")

然后保存代码，重新启动chatglm就可以正常加载模型了。

参考地址 https://github.com/THUDM/ChatGLM2-6B/issues/52#issuecomment-1608625913

cgisky1980 commented 1 year ago

在chatglm2的官方github上面找到了解决方案，有遇到跟我一样的情况可以尝试一下。修改wenda\llms\llm_glm6b.py,找到
 if "chatglm2" in settings.llm.path:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, device=device, revision="v1.1.0")
    else:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0")
修改为：
if "chatglm2" in settings.llm.path:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True,revision="v1.1.0").cuda()
    else:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0")
然后保存代码，重新启动chatglm就可以正常加载模型了。

参考地址 THUDM/ChatGLM2-6B#52 (comment)

提交个pr嘛

serfan commented 1 year ago

已提交pr

wenda-LLM / wenda

chatglm2一直报CUDA out of memory #410