Closed cxchen100 closed 1 month ago
I found some comments above fishy and irrelevant to this issue. I reported the case to GitHub and hided the comments.
Hi @cxchen100 ,
What 1) command and 2) yaml file did you use to run? Please use the issue template to report an issue
Hi @cxchen100 ,
What 1) command and 2) yaml file did you use to run? Please use the issue template to report an issue sorry
Yes,command is ,thank you
accelerate launch examples/hf_transformers/text_classification.py \ --config configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml \ --task cola \ --run_log log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt \ --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/
Hi @cxchen100 , What 1) command and 2) yaml file did you use to run? Please use the issue template to report an issue sorry
Yes,command is ,thank you
accelerate launch examples/hf_transformers/text_classification.py --config configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml --task cola --run_log log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/
these command is from this colab: https://colab.research.google.com/github/yoshitomo-matsubara/torchdistill/blob/main/demo/glue_kd_and_submission.ipynb#scrollTo=bFHCWbIG1paE
Did you run accelerate config
(!accelerate config
if notebook) before you ran the script?
If so, what was your config?
Restart the session, download the latest config files by !git clone https://github.com/yoshitomo-matsubara/torchdistill
, and run the script again.
I found and resolved the issue in PR #484 . Feel free to re-open if it doesn't resolve your issue
when i run the command , i get error like ,how to solve this question, thank you:
Traceback (most recent call last): File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 301, in
main(argparser.parse_args())
File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 269, in main
train(teacher_model, student_model, dataset_dict, is_regression, dst_ckpt_dir_path, metric,
File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 160, in train
train_one_epoch(training_box, epoch, log_freq)
File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 119, in train_one_epoch
loss = training_box.forward_process(sample_batch, targets=None, supp_dict=None)
File "/data/llm/torchdistill/torchdistill/core/distillation.py", line 424, in forward_process
total_loss = self.criterion(io_dict, model_loss_dict, targets)
File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, kwargs)
File "/data/llm/torchdistill/torchdistill/losses/high_level.py", line 82, in forward
loss_dict[loss_name] = factor criterion(student_io_dict, teacher_io_dict, targets)
File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, kwargs)
File "/data/llm/torchdistill/torchdistill/losses/mid_level.py", line 175, in forward
student_logits = student_io_dict[self.student_module_path][self.student_module_io]
KeyError: '.classifier'
Traceback (most recent call last):
File "/data/llm/miniconda3/envs/python_for_torchdistill/bin/accelerate", line 8, in
sys.exit(main())
File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/data/llm/miniconda3/envs/python_for_torchdistill/bin/python', 'examples/hf_transformers/text_classification.py', '--config', 'configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml', '--task', 'cola', '--run_log', 'log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt', '--private_output', 'leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/']' returned non-zero exit status 1.
my environment like this: accelerate 0.33.0 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aiosignal 1.3.1 annotated-types 0.7.0 async-timeout 4.0.3 attrs 24.2.0 certifi 2024.7.4 charset-normalizer 3.3.2 Cython 3.0.11 datasets 2.21.0 deepspeed 0.15.0 dill 0.3.8 evaluate 0.4.2 filelock 3.15.4 frozenlist 1.4.1 fsspec 2024.6.1 hjson 3.1.0 huggingface-hub 0.24.6 idna 3.8 Jinja2 3.1.4 joblib 1.4.2 MarkupSafe 2.1.5 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.3 ninja 1.11.1.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.20 nvidia-nvtx-cu12 12.1.105 packaging 24.1 pandas 2.2.2 pillow 10.4.0 pip 24.2 protobuf 5.27.4 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pydantic 2.8.2 pydantic_core 2.20.1 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.2 regex 2024.7.24 requests 2.32.3 safetensors 0.4.4 scikit-learn 1.5.1 scipy 1.14.1 sentencepiece 0.2.0 setuptools 72.1.0 six 1.16.0 sympy 1.13.2 threadpoolctl 3.5.0 tokenizers 0.19.1 torch 2.4.0 torchvision 0.19.0 tqdm 4.66.5 transformers 4.44.2 triton 3.0.0 typing_extensions 4.12.2 tzdata 2024.1 urllib3 2.2.2 wheel 0.43.0 xxhash 3.5.0 yarl 1.9.4