yoshitomo-matsubara / torchdistill

A coding-free framework built on PyTorch for reproducible deep learning studies. 🏆25 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.
https://yoshitomo-matsubara.net/torchdistill/
MIT License
1.37k stars 132 forks source link

get an error #481

Closed cxchen100 closed 1 month ago

cxchen100 commented 1 month ago

when i run the command , i get error like ,how to solve this question, thank you:

Traceback (most recent call last): File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 301, in main(argparser.parse_args()) File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 269, in main train(teacher_model, student_model, dataset_dict, is_regression, dst_ckpt_dir_path, metric, File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 160, in train train_one_epoch(training_box, epoch, log_freq) File "/data/llm/torchdistill/examples/hf_transformers/text_classification.py", line 119, in train_one_epoch loss = training_box.forward_process(sample_batch, targets=None, supp_dict=None) File "/data/llm/torchdistill/torchdistill/core/distillation.py", line 424, in forward_process total_loss = self.criterion(io_dict, model_loss_dict, targets) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, kwargs) File "/data/llm/torchdistill/torchdistill/losses/high_level.py", line 82, in forward loss_dict[loss_name] = factor criterion(student_io_dict, teacher_io_dict, targets) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, kwargs) File "/data/llm/torchdistill/torchdistill/losses/mid_level.py", line 175, in forward student_logits = student_io_dict[self.student_module_path][self.student_module_io] KeyError: '.classifier' Traceback (most recent call last): File "/data/llm/miniconda3/envs/python_for_torchdistill/bin/accelerate", line 8, in sys.exit(main()) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/data/llm/miniconda3/envs/python_for_torchdistill/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/data/llm/miniconda3/envs/python_for_torchdistill/bin/python', 'examples/hf_transformers/text_classification.py', '--config', 'configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml', '--task', 'cola', '--run_log', 'log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt', '--private_output', 'leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/']' returned non-zero exit status 1.

my environment like this: accelerate 0.33.0 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aiosignal 1.3.1 annotated-types 0.7.0 async-timeout 4.0.3 attrs 24.2.0 certifi 2024.7.4 charset-normalizer 3.3.2 Cython 3.0.11 datasets 2.21.0 deepspeed 0.15.0 dill 0.3.8 evaluate 0.4.2 filelock 3.15.4 frozenlist 1.4.1 fsspec 2024.6.1 hjson 3.1.0 huggingface-hub 0.24.6 idna 3.8 Jinja2 3.1.4 joblib 1.4.2 MarkupSafe 2.1.5 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.3 ninja 1.11.1.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.20 nvidia-nvtx-cu12 12.1.105 packaging 24.1 pandas 2.2.2 pillow 10.4.0 pip 24.2 protobuf 5.27.4 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pydantic 2.8.2 pydantic_core 2.20.1 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.2 regex 2024.7.24 requests 2.32.3 safetensors 0.4.4 scikit-learn 1.5.1 scipy 1.14.1 sentencepiece 0.2.0 setuptools 72.1.0 six 1.16.0 sympy 1.13.2 threadpoolctl 3.5.0 tokenizers 0.19.1 torch 2.4.0 torchvision 0.19.0 tqdm 4.66.5 transformers 4.44.2 triton 3.0.0 typing_extensions 4.12.2 tzdata 2024.1 urllib3 2.2.2 wheel 0.43.0 xxhash 3.5.0 yarl 1.9.4

yoshitomo-matsubara commented 1 month ago

I found some comments above fishy and irrelevant to this issue. I reported the case to GitHub and hided the comments.

yoshitomo-matsubara commented 1 month ago

Hi @cxchen100 ,

What 1) command and 2) yaml file did you use to run? Please use the issue template to report an issue

cxchen100 commented 1 month ago

Hi @cxchen100 ,

What 1) command and 2) yaml file did you use to run? Please use the issue template to report an issue sorry

Yes,command is ,thank you

accelerate launch examples/hf_transformers/text_classification.py \ --config configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml \ --task cola \ --run_log log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt \ --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

cxchen100 commented 1 month ago

Hi @cxchen100 , What 1) command and 2) yaml file did you use to run? Please use the issue template to report an issue sorry

Yes,command is ,thank you

accelerate launch examples/hf_transformers/text_classification.py --config configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml --task cola --run_log log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

these command is from this colab: https://colab.research.google.com/github/yoshitomo-matsubara/torchdistill/blob/main/demo/glue_kd_and_submission.ipynb#scrollTo=bFHCWbIG1paE

yoshitomo-matsubara commented 1 month ago

Did you run accelerate config (!accelerate config if notebook) before you ran the script? If so, what was your config?

yoshitomo-matsubara commented 1 month ago

Restart the session, download the latest config files by !git clone https://github.com/yoshitomo-matsubara/torchdistill, and run the script again.

I found and resolved the issue in PR #484 . Feel free to re-open if it doesn't resolve your issue