win10使用qwen/Qwen-7B-Chat模型实现文本摘要遇到错误

测试环境 win10 RTX 3080 cuda 12.1 cudnn 8.9.6 python 3.10 torch 2.3.1+cu121 tensorflow 2.16.1

测试代码

import os

from pylmkit.app import summary, batch_summary
from pylmkit.llms import LocalLLMModel
import datetime

def worker(chunk):
    return model.invoke(f"提取下面内容的摘要：\n\ncontent: {chunk}")

old_time = datetime.datetime.now()
model = LocalLLMModel(
    model_path='F:/QwenModel/qwen/Qwen-7B-Chat',  # 前面保存的模型文件路径
    tokenizer_kwargs={"revision": 'master'},
    model_kwargs={"revision": 'master'},
    language='zh'
)
text = """
各有关单位：
根据《上海市档案科技研究成果管理办法》，我局将组织2024年度上海市档案科技研究成果奖的评选工作。现将有关事项通知如下：
一、申报范围
2024年度档案科技研究成果评奖申报的范围为2023年底前通过验收，经实践证明确实具有社会效益或经济效益，且在上海市档案局办理过成果登记手续的档案科研项目；以往未获奖但在近年来成果推广应用取得显著成效且具有新的应用证明的项目可重新申报。
二、奖项设置
2024年度上海市档案科技研究成果奖设特等奖、一等奖、二等奖、三等奖4个等级的奖项。
三、申报要求
申报单位须填写《上海市档案科技研究成果奖申报书》（一式4份，加盖公章）。其中3份《申报书》分别与该项目的《科技项目验收证书》《科技成果推广应用证明》以及课题研究相关材料（如工作报告、研究报告、技术报告）等纸质申报材料装订成册，一式3套，并附1套电子版光盘。
四、注意事项
1.申报工作截止日期为2024年5月29日。请各有关单位按时向我局科技信息化处报送申报材料，逾期不予受理。
2.申报书中完成人员的排序须按照科技项目验收证书中的排序。成果奖获奖人数实行限额：特等奖不超过11人，一等奖不超过9人，二等奖不超过8人，三等奖不超过7人。
3.《上海市档案科技研究成果奖申报书》可从上海档案信息网公告栏或公共服务栏中下载，网址：http://www.archives.sh.cn。
我局将在上海市档案科技研究成果获奖项目中择优推荐申报国家档案局优秀科技成果奖。
"""

# 单个文本摘要：支持短文本、长文本摘要提取
summary1 = summary(text,
                   worker,
                   max_chunk_size=1000,  # 当大于最大长度时，将采用分段提取摘要，然后在汇总摘要
                   show_progress=True,  # 进度条
                   max_summary_size=500,  # 当汇总后的摘要长度大于最大长度时，将采用分段提取摘要，然后在汇总摘要
                   max_workers=5  # 最大线程数
                   )
print(summary1)
print(datetime.datetime.now() - old_time)

报错信息

2024-06-19 08:57:48 - accelerate.utils.modeling.modeling.py - INFO - 1008 - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:03<00:00,  2.28it/s]
2024-06-19 08:57:52 - root.big_modeling.py - WARNING - 435 - Some parameters are on the meta device device because they were offloaded to the cpu.
2024-06-19 08:57:52 - accelerate.big_modeling.big_modeling.py - WARNING - 452 - You shouldn't move a model that is dispatched using accelerate hooks.
Traceback (most recent call last):
  File "F:\Soft\TestHHH.py", line 13, in <module>
    model = LocalLLMModel(
  File "F:\CondaSpace\envs\tymodel\lib\site-packages\pylmkit\llms\_huggingface_llm.py", line 40, in __init__
    self.model = self.model.half().cuda()
  File "F:\CondaSpace\envs\tymodel\lib\site-packages\accelerate\big_modeling.py", line 455, in wrapper
    raise RuntimeError("You can't move a model that has some modules offloaded to cpu or disk.")
RuntimeError: You can't move a model that has some modules offloaded to cpu or disk.

根据报错的信息提到的两个py文件好像是跟能否调用cuda有关但实际上是可以调用的

>>> import torch
>>> print(torch.__version__)
2.3.1+cu121
>>> print(torch.cuda.is_available())
True
>>> print(torch.cuda.get_device_name(0))
NVIDIA GeForce RTX 3080
>>>

期望结果他这个报错应该如何修改才可以正常运行(在linux中是可以正常跑的)

xiaozhi-agent / pylmkit

win10使用qwen/Qwen-7B-Chat模型实现文本摘要遇到错误 #7