(bee) D:\honeybee-main>torchrun --nproc_per_node=auto --standalone eval_tasks.py --ckpt_path checkpoints/13B-C-Abs-M576/last --config configs/tasks/sqa.yaml
NOTE: Redirects are currently not supported in Windows or MacOs.
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
Namespace(ckpt_path='checkpoints/13B-C-Abs-M576/last', result_dir='eval_results/', config=['configs/tasks/sqa.yaml'], load_results=False, dump_submission_file=False,
batch_size=None)
INFO 01/04 15:19:14 | Init (load model, tokenizer, processor) ...
Traceback (most recent call last):
File "D:\honeybee-main\eval_tasks.py", line 152, in
model, tokenizer, processor = init(args.ckpt_path, args.load_results)
File "D:\honeybee-main\eval_tasks.py", line 60, in init
model, tokenizer, processor = get_model(ckpt_path)
File "D:\honeybee-main\pipeline\interface.py", line 74, in get_model
model = load_model(pretrained_ckpt, use_bf16, load_in_8bit)
File "D:\honeybee-main\pipeline\interface.py", line 53, in load_model
model = HoneybeeForConditionalGeneration.from_pretrained(
File "D:\anaconda\envs\bee\lib\site-packages\transformers\modeling_utils.py", line 2305, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "D:\anaconda\envs\bee\lib\site-packages\transformers\configuration_utils.py", line 547, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, kwargs)
File "D:\anaconda\envs\bee\lib\site-packages\transformers\configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs)
File "D:\anaconda\envs\bee\lib\site-packages\transformers\configuration_utils.py", line 629, in _get_config_dict
resolved_config_file = cached_file(
File "D:\anaconda\envs\bee\lib\site-packages\transformers\utils\hub.py", line 388, in cached_file
raise EnvironmentError(
OSError: checkpoints/13B-C-Abs-M576/last does not appear to have a file named config.json. Checkout 'https://huggingface.co/checkpoints/13B-C-Abs-M576/last/None' for
available files.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 6364) of binary: D:\anaconda\envs\bee\python.exe
Traceback (most recent call last):
File "D:\anaconda\envs\bee\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\anaconda\envs\bee\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\anaconda\envs\bee\Scripts\torchrun.exe__main.py", line 7, in
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\elastic\multiprocessing\errors__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\run.py", line 794, in main
run(args)
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\run.py", line 785, in run
elastic_launch(
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call__
return launch_agent(self._config, self._entrypoint, list(args))
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
eval_tasks.py FAILED
Failures:
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-01-04_15:19:16
host : SK-20230830XPTL
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 6364)
error_file:
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
I have downloaded the checkpoint and the content inside is only *pytorch_ model. bin* , error encountered while executing Evaluation. How can I resolve this issue
(bee) D:\honeybee-main>torchrun --nproc_per_node=auto --standalone eval_tasks.py --ckpt_path checkpoints/13B-C-Abs-M576/last --config configs/tasks/sqa.yaml NOTE: Redirects are currently not supported in Windows or MacOs. master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. Namespace(ckpt_path='checkpoints/13B-C-Abs-M576/last', result_dir='eval_results/', config=['configs/tasks/sqa.yaml'], load_results=False, dump_submission_file=False, batch_size=None) INFO 01/04 15:19:14 | Init (load model, tokenizer, processor) ... Traceback (most recent call last): File "D:\honeybee-main\eval_tasks.py", line 152, in
model, tokenizer, processor = init(args.ckpt_path, args.load_results)
File "D:\honeybee-main\eval_tasks.py", line 60, in init
model, tokenizer, processor = get_model(ckpt_path)
File "D:\honeybee-main\pipeline\interface.py", line 74, in get_model
model = load_model(pretrained_ckpt, use_bf16, load_in_8bit)
File "D:\honeybee-main\pipeline\interface.py", line 53, in load_model
model = HoneybeeForConditionalGeneration.from_pretrained(
File "D:\anaconda\envs\bee\lib\site-packages\transformers\modeling_utils.py", line 2305, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "D:\anaconda\envs\bee\lib\site-packages\transformers\configuration_utils.py", line 547, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, kwargs)
File "D:\anaconda\envs\bee\lib\site-packages\transformers\configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs)
File "D:\anaconda\envs\bee\lib\site-packages\transformers\configuration_utils.py", line 629, in _get_config_dict
resolved_config_file = cached_file(
File "D:\anaconda\envs\bee\lib\site-packages\transformers\utils\hub.py", line 388, in cached_file
raise EnvironmentError(
OSError: checkpoints/13B-C-Abs-M576/last does not appear to have a file named config.json. Checkout 'https://huggingface.co/checkpoints/13B-C-Abs-M576/last/None' for
available files.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 6364) of binary: D:\anaconda\envs\bee\python.exe
Traceback (most recent call last):
File "D:\anaconda\envs\bee\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\anaconda\envs\bee\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\anaconda\envs\bee\Scripts\torchrun.exe__main.py", line 7, in
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\elastic\multiprocessing\errors__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\run.py", line 794, in main
run(args)
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\run.py", line 785, in run
elastic_launch(
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call__
return launch_agent(self._config, self._entrypoint, list(args))
File "D:\anaconda\envs\bee\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
eval_tasks.py FAILED
Failures: