modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.25k stars 376 forks source link

加载本地下好的sd模型进行微调时,仍然需要去网页端下载模型 #1616

Closed jamesbondzhou closed 2 months ago

jamesbondzhou commented 3 months ago

您好,我在使用examples/pytorch/multi_modal/notebook/text_to_image_synthesis.ipynb中的代码时出现如下问题:

1.将数据集和模型sd-v1.5下载到本地后,执行1.4.2中代码:

model_id = 'damo/multi-modal_efficient-diffusion-tuning-swift-base' task = 'efficient-diffusion-tuning' revision = 'v1.0.1'

model_dir = '/ssd/project/swift/stable-diffusion-v1-5' # 改为使用本地路径,modelid对应的configjson文件也一并放到这个路径了。不使用snapshot_download(model_id) cfg_dict = Config.from_file(os.path.join(model_dir, ModelFile.CONFIGURATION)) cfg_dict.model.inference = False model = Model.from_pretrained(model_dir, cfg_dict=cfg_dict, revision=revision)

2.报错

ConnectionError: EfficientStableDiffusion: HTTPSConnectionPool(host='www.modelscope.cn‘, port=443): Max retries exceeded with url: /api/v1/models//revisions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe43d8d0940>: Failed to establish a new connection: [Errno -2] Name or service not known') NewConnectionError Traceback (most recent call last) File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 702 # Make the request on the httplib connection object. --> 703 httplib_response = self._make_request( 704 conn, 705 method, 706 url, 707 timeout=timeout_obj, 708 body=body, 709 headers=headers, 710 chunked=chunked, 711 ) 713 # If we're going to release the connection in finally:, then 714 # the response doesn't need to know about the connection. Otherwise 715 # it will also try to release it and we'll have a double-release 716 # mess.

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/connectionpool.py:386, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 385 try: --> 386 self._validate_conn(conn) 387 except (SocketTimeout, BaseSSLError) as e: 388 # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/connectionpool.py:1042, in HTTPSConnectionPool._validate_conn(self, conn) 1041 if not getattr(conn, "sock", None): # AppEngine might not have .sock -> 1042 conn.connect() 1044 if not conn.is_verified:

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/connection.py:358, in HTTPSConnection.connect(self) 356 def connect(self): 357 # Add certificate verification --> 358 self.sock = conn = self._new_conn() 359 hostname = self.host

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/connection.py:186, in HTTPConnection._new_conn(self) 185 except SocketError as e: --> 186 raise NewConnectionError( 187 self, "Failed to establish a new connection: %s" % e 188 ) 190 return conn

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fe43d8d0940>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last) File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/requests/adapters.py:667, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies) 666 try: --> 667 resp = conn.urlopen( 668 method=request.method, 669 url=url, 670 body=request.body, 671 headers=request.headers, 672 redirect=False, 673 assert_same_host=False, 674 preload_content=False, 675 decode_content=False, 676 retries=self.max_retries, 677 timeout=timeout, 678 chunked=chunked, 679 ) 681 except (ProtocolError, OSError) as err:

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/connectionpool.py:815, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, response_kw) 812 log.warning( 813 "Retrying (%r) after connection broken by '%r': %s", retries, err, url 814 ) --> 815 return self.urlopen( 816 method, 817 url, 818 body, 819 headers, 820 retries, 821 redirect, 822 assert_same_host, 823 timeout=timeout, 824 pool_timeout=pool_timeout, 825 release_conn=release_conn, 826 chunked=chunked, 827 body_pos=body_pos, 828 response_kw 829 ) 831 # Handle redirect?

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/connectionpool.py:815, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, response_kw) 812 log.warning( 813 "Retrying (%r) after connection broken by '%r': %s", retries, err, url 814 ) --> 815 return self.urlopen( 816 method, 817 url, 818 body, 819 headers, 820 retries, 821 redirect, 822 assert_same_host, 823 timeout=timeout, 824 pool_timeout=pool_timeout, 825 release_conn=release_conn, 826 chunked=chunked, 827 body_pos=body_pos, 828 response_kw 829 ) 831 # Handle redirect?

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/connectionpool.py:787, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 785 e = ProtocolError("Connection aborted.", e) --> 787 retries = retries.increment( 788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] 789 ) 790 retries.sleep()

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace) 591 if new_retry.is_exhausted(): --> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='www.modelscope.cn', port=443): Max retries exceeded with url: /api/v1/models//revisions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe43d8d0940>: Failed to establish a new connection: [Errno -2] Name or service not known'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last) File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/utils/registry.py:210, in build_from_cfg(cfg, registry, group_key, default_args) 209 if hasattr(obj_cls, '_instantiate'): --> 210 return obj_cls._instantiate(**args) 211 else:

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/models/multi_modal/efficient_diffusion_tuning/efficient_stable_diffusion.py:318, in EfficientStableDiffusion._instantiate(cls, model_dir, **kwargs) 316 config.model[k] = v --> 318 model = EfficientStableDiffusion( 319 model_dir, 320 pretrained_model_name_or_path=config.model. 321 pretrained_model_name_or_path, 322 tuner_name=config.model.tuner_name, 323 tuner_config=config.model.tuner_config, 324 pretrained_tuner=config.model.get('pretrained_tuner', None), 325 inference=config.model.get('inference', False)) 326 model.config = config

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/models/multi_modal/efficient_diffusion_tuning/efficient_stable_diffusion.py:63, in EfficientStableDiffusion.init(self, model_dir, *args, **kwargs) 60 pretrained_model_name_or_path = kwargs.pop( 61 'pretrained_model_name_or_path', 62 'AI-ModelScope/stable-diffusion-v1-5') ---> 63 pretrained_model_name_or_path = snapshot_download( 64 pretrained_model_name_or_path) 65 tuner_config = kwargs.pop('tuner_config', None)

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/hub/snapshot_download.py:84, in snapshot_download(model_id, revision, cache_dir, user_agent, local_files_only, cookies, ignore_file_pattern, allow_file_pattern, local_dir, allow_patterns, ignore_patterns) 39 """Download all files of a repo. 40 Downloads a whole snapshot of a repo's files at the specified revision. This 41 is useful when you want all files from a repo, because you don't know which (...) 82 if some parameter value is invalid 83 """ ---> 84 return _snapshot_download( 85 model_id, 86 repo_type=REPO_TYPE_MODEL, 87 revision=revision, 88 cache_dir=cache_dir, 89 user_agent=user_agent, 90 local_files_only=local_files_only, 91 cookies=cookies, 92 ignore_file_pattern=ignore_file_pattern, 93 allow_file_pattern=allow_file_pattern, 94 local_dir=local_dir, 95 ignore_patterns=ignore_patterns, 96 allow_patterns=allow_patterns)

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/hub/snapshot_download.py:220, in _snapshot_download(repo_id, repo_type, revision, cache_dir, user_agent, local_files_only, cookies, ignore_file_pattern, allow_file_pattern, local_dir, allow_patterns, ignore_patterns) 219 if repo_type == REPO_TYPE_MODEL: --> 220 revision_detail = _api.get_valid_revision_detail( 221 repo_id, revision=revision, cookies=cookies) 222 revision = revision_detail['Revision']

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/hub/api.py:499, in HubApi.get_valid_revision_detail(self, model_id, revision, cookies) 496 # for active development in library codes (non-release-branches), release_timestamp 497 # is set to be a far-away-time-in-the-future, to ensure that we shall 498 # get the master-HEAD version from model repo by default (when no revision is provided) --> 499 all_branches_detail, all_tags_detail = self.get_model_branches_and_tags_details( 500 model_id, use_cookies=False if cookies is None else cookies) 501 all_branches = [x['Revision'] for x in all_branches_detail] if all_branches_detail else []

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/hub/api.py:582, in HubApi.get_model_branches_and_tags_details(self, model_id, use_cookies) 581 path = f'{self.endpoint}/api/v1/models/{model_id}/revisions' --> 582 r = self.session.get(path, cookies=cookies, 583 headers=self.builder_headers(self.headers)) 584 handle_http_response(r, logger, cookies, model_id)

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/requests/sessions.py:602, in Session.get(self, url, kwargs) 601 kwargs.setdefault("allow_redirects", True) --> 602 return self.request("GET", url, kwargs)

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 588 send_kwargs.update(settings) --> 589 resp = self.send(prep, **send_kwargs) 591 return resp

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/requests/sessions.py:703, in Session.send(self, request, kwargs) 702 # Send the request --> 703 r = adapter.send(request, kwargs) 705 # Total elapsed time of the request (approximately)

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/requests/adapters.py:700, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies) 698 raise SSLError(e, request=request) --> 700 raise ConnectionError(e, request=request) 702 except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='www.modelscope.cn', port=443): Max retries exceeded with url: /api/v1/models//revisions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe43d8d0940>: Failed to establish a new connection: [Errno -2] Name or service not known'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last) Cell In[5], line 9 7 cfg_dict.model.inference = False 8 print(cfg_dict) ----> 9 model = Model.from_pretrained(model_dir, cfg_dict=cfg_dict, revision=revision)

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/models/base/base_model.py:183, in Model.from_pretrained(cls, model_name_or_path, revision, cfg_dict, device, **kwargs) 181 model = build_backbone(model_cfg) 182 else: --> 183 model = build_model(model_cfg, task_name=task_name) 185 # dynamically add pipeline info to model for pipeline inference 186 if hasattr(cfg, 'pipeline'):

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/models/builder.py:35, in build_model(cfg, task_name, default_args) 26 """ build model given model config dict 27 28 Args: (...) 32 default_args (dict, optional): Default initialization arguments. 33 """ 34 try: ---> 35 model = build_from_cfg( 36 cfg, MODELS, group_key=task_name, default_args=default_args) 37 except KeyError as e: 38 # Handle subtask with a backbone model that hasn't been registered 39 # All the subtask with a parent task should have a task model, otherwise it is not a 40 # valid subtask 41 parent_task, task_model_type = get_task_by_subtask_name(task_name)

File /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/utils/registry.py:215, in build_from_cfg(cfg, registry, group_key, default_args) 212 return obj_cls(**args) 213 except Exception as e: 214 # Normal TypeError does not print class name. --> 215 raise type(e)(f'{obj_cls.name}: {e}')

ConnectionError: EfficientStableDiffusion: HTTPSConnectionPool(host='www.modelscope.cn', port=443): Max retries exceeded with url: /api/v1/models//revisions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe43d8d0940>: Failed to establish a new connection: [Errno -2] Name or service not known'))

3.初步判断

我看了中间的报错,是因为EfficientStableDiffusion初始化第63行,默认还是执行snapshot_download。可是当使用本地模型时,第60行的pretrained_model_name_or_path 已经被指定为本地路径了,如下: /data/anaconda3/envs/lora_train_new/lib/python3.10/site-packages/modelscope/models/multi_modal/efficient_diffusion_tuning/efficient_stable_diffusion.py, in EfficientStableDiffusion.init(self, model_dir, *args, **kwargs) [60]pretrained_model_name_or_path = kwargs.pop( [61] 'pretrained_model_name_or_path', [62] 'AI-ModelScope/stable-diffusion-v1-5') ---> [63]pretrained_model_name_or_path = snapshot_download( [64] pretrained_model_name_or_path) [65] tuner_config = kwargs.pop('tuner_config', None)

不知道我理解的对不对哈,请大佬看看呀,另外我下载的sd模型是huggingface的,应该和AI-ModelScope/stable-diffusion-v1-5一样吧

Jintao-Huang commented 3 months ago

sd的微调可以查看这个repo: https://github.com/modelscope/DiffSynth-Studio