wenda-LLM / wenda

闻达:一个LLM调用平台。目标为针对特定环境的高效内容生成,同时考虑个人和中小企业的计算资源局限性,以及知识安全和私密性问题
GNU Affero General Public License v3.0
6.23k stars 810 forks source link

chatglm2一直报CUDA out of memory #410

Closed serfan closed 1 year ago

serfan commented 1 year ago

老版本chatglm-6b-int4没问题,可以正常跑,改用chatglm2-6b-int4之后,一直报错CUDA out of memory,更新了最新的闻达也不行,报错信息如下,笔记本3060显卡只有6G显存,难道没机会体验chatglm2了?:( Exception in thread Thread-1 (load_model): Traceback (most recent call last): File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\threading.py", line 1038, in _bootstrap_inner self.run() File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\threading.py", line 975, in run self._target(*self._args, self._kwargs) File "E:\ai\wenda\wenda\wenda.py", line 51, in load_model LLM.load_model() File "E:\ai\wenda\wenda\llms\llm_glm6b.py", line 71, in load_model model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, device=device, revision="v1.1.0") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\transformers\models\auto\auto_factory.py", line 479, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\transformers\modeling_utils.py", line 2675, in from_pretrained model = cls(config, model_args, model_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\serfa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 767, in init self.transformer = ChatGLMModel(config, empty_init=empty_init, device=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\serfa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 700, in init self.encoder = init_method(GLMTransformer, config, init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\utils\init.py", line 52, in skip_init return module_cls(args, kwargs).to_empty(device=final_device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 1024, in to_empty return self._apply(lambda t: torch.empty_like(t, device=device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 797, in _apply module._apply(fn) [Previous line repeated 1 more time] File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 820, in _apply param_applied = fn(param) ^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch\nn\modules\module.py", line 1024, in return self._apply(lambda t: torch.empty_like(t, device=device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\ai\wenda\WPy64-31110\python-3.11.1.amd64\Lib\site-packages\torch_refs__init__.py", line 4254, in empty_like return torch.empty_strided( ^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB (GPU 0; 6.00 GiB total capacity; 5.34 GiB already allocated; 0 bytes free; 5.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

stephsix commented 1 year ago

+1 10G的3080也报错了

serfan commented 1 year ago

在chatglm2的官方github上面找到了解决方案,有遇到跟我一样的情况可以尝试一下。 修改wenda\llms\llm_glm6b.py,找到

 if "chatglm2" in settings.llm.path:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, device=device, revision="v1.1.0")
    else:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0")

修改为:

if "chatglm2" in settings.llm.path:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True,revision="v1.1.0").cuda()
    else:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0")

然后保存代码,重新启动chatglm就可以正常加载模型了。

参考地址 https://github.com/THUDM/ChatGLM2-6B/issues/52#issuecomment-1608625913

cgisky1980 commented 1 year ago

在chatglm2的官方github上面找到了解决方案,有遇到跟我一样的情况可以尝试一下。 修改wenda\llms\llm_glm6b.py,找到

 if "chatglm2" in settings.llm.path:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, device=device, revision="v1.1.0")
    else:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0")

修改为:

if "chatglm2" in settings.llm.path:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True,revision="v1.1.0").cuda()
    else:
        model = AutoModel.from_pretrained(settings.llm.path, local_files_only=True, trust_remote_code=True, revision="v1.1.0")

然后保存代码,重新启动chatglm就可以正常加载模型了。

参考地址 THUDM/ChatGLM2-6B#52 (comment)

提交个pr嘛

serfan commented 1 year ago

已提交pr