Closed xiehuanyi closed 3 months ago
I ran your command provided in the readme. But got some error.
torchrun --standalone --nproc_per_node=gpu src/gpt2_ft.py \ --train_data ./data/e2e/train.jsonl \ --valid_data ./data/e2e/valid.jsonl \ --train_batch_size 1 \ --grad_acc 1 \ --noise_multiplier 0.6 \ --max_grad_norm 1.0> \ --valid_batch_size 4 \ --seq_len 512 \ --model_c> ard gpt2.md \ --init_checkpoint ./pretrained_che> ckpoints/gpt2-pytorch_model.bin \ > --clip 0.0 \ --lr 0.0004 \ --weight_> decay 0.01 \ --correct_bias \ --adam_beta2 0.999 > \ --scheduler constant \ --warm> up_step 0 \ --max_epoch 20 \ > --save_interval 1000 \ --lora_dim 4 \ > --lora_alpha 32 \ --lora_dropout 0.0 \ --label_smooth 0.> 1 \ --work_dir ./trained_models/GPT2_M/e> 2e \ --random_seed 110> > > > > > > > > > > > > > Traceback (most recent call last): File "src/gpt2_ft.py", line 23, in <module> from opacus.layers import DifferentiallyPrivateDistributedDataParallel as DPDDP ImportError: cannot import name 'DifferentiallyPrivateDistributedDataParallel' from 'opacus.layers' (/opt/conda/lib/python3.7/site-packages/opacus/layers/__init__.py) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1077) of binary: /opt/conda/bin/python Traceback (most recent call last): File "/opt/conda/bin/torchrun", line 8, in <module> sys.exit(main()) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/run.py", line 756, in run )(*cmd_args) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 248, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ src/gpt2_ft.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-07-03_11:31:53 host : f7e96c4001945011ee0917a0c7f822b9ff09-task1-0.f7e96c4001945011ee0917a0c7f822b9ff09.ed9ed5aa2938366557d3851691b38794.svc.cluster.local rank : 0 (local_rank: 0) exitcode : 1 (pid: 1077) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
And here is the package I installed
# pip list Package Version ------------------------ ---------- absl-py 1.4.0 accelerate 0.20.3 aiohttp 3.8.3 aiosignal 1.3.1 anyio 3.6.2 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 async-timeout 4.0.2 asynctest 0.13.0 attrs 21.4.0 Babel 2.11.0 backcall 0.2.0 beautifulsoup4 4.10.0 bleach 5.0.0 blis 0.7.9 brotlipy 0.7.0 cachetools 5.3.1 catalogue 2.0.8 certifi 2021.10.8 cffi 1.14.6 chardet 4.0.0 charset-normalizer 2.1.1 click 8.1.3 conda 4.10.3 conda-build 3.21.5 conda-package-handling 1.7.3 confection 0.1.0 cryptography 35.0.0 cymem 2.0.7 datasets 2.7.1 debugpy 1.6.0 decorator 5.1.0 defusedxml 0.7.1 dill 0.3.6 dnspython 2.1.0 elastic-transport 8.4.0 elasticsearch 8.5.3 entrypoints 0.4 evaluate 0.4.0 faiss-gpu 1.7.2 fastjsonschema 2.15.3 filelock 3.3.1 frozenlist 1.3.3 fsspec 2022.11.0 glob2 0.7 google-auth 2.21.0 google-auth-oauthlib 0.4.6 grpcio 1.56.0 huggingface-hub 0.11.1 idna 2.10 importlib-metadata 4.11.3 importlib-resources 5.7.1 ipykernel 6.13.0 ipython 7.29.0 ipython-genutils 0.2.0 jedi 0.18.0 jieba 0.42.1 Jinja2 3.1.1 json5 0.9.6 jsonschema 4.4.0 jupyter-client 7.3.0 jupyter-core 4.10.0 jupyter-server 1.23.3 jupyterlab 3.0.0 jupyterlab-pygments 0.2.2 jupyterlab-server 2.16.5 langcodes 3.3.0 libarchive-c 2.9 loralib 0.1.1 Markdown 3.4.3 MarkupSafe 2.1.3 matplotlib-inline 0.1.2 mistune 0.8.4 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 multidict 6.0.3 multiprocess 0.70.14 murmurhash 1.0.9 nbclassic 1.0.0 nbclient 0.6.0 nbconvert 6.5.0 nbformat 5.3.0 nest-asyncio 1.5.5 nlp 0.4.0 notebook 6.4.11 notebook-shim 0.2.3 numpy 1.21.2 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 oauthlib 3.2.2 olefile 0.46 opacus 1.4.0 opt-einsum 3.3.0 packaging 21.3 pandas 1.3.5 pandocfilters 1.5.0 parso 0.8.2 pathy 0.10.2 peft 0.3.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.4.0 pip 21.0.1 pkginfo 1.7.1 preshed 3.0.8 progress 1.6 prometheus-client 0.14.1 prompt-toolkit 3.0.20 protobuf 3.20.3 psutil 5.8.0 ptyprocess 0.7.0 pyarrow 10.0.1 pyasn1 0.5.0 pyasn1-modules 0.3.0 pycosat 0.6.3 pycparser 2.20 pydantic 1.10.10 Pygments 2.10.0 pyOpenSSL 20.0.1 pyparsing 3.0.8 pyrsistent 0.18.1 PySocks 1.7.1 python-dateutil 2.8.2 python-etcd 0.4.5 pytz 2021.3 PyYAML 6.0 pyzmq 22.3.0 regex 2022.10.31 requests 2.28.1 requests-oauthlib 1.3.1 responses 0.18.0 rsa 4.9 ruamel-yaml-conda 0.15.100 scipy 1.7.3 Send2Trash 1.8.0 sentencepiece 0.1.99 setuptools 58.0.4 six 1.16.0 smart-open 6.3.0 sniffio 1.3.0 soupsieve 2.2.1 spacy 3.5.4 spacy-legacy 3.0.12 spacy-loggers 1.0.4 srsly 2.4.6 tensorboard 2.11.2 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 terminado 0.13.3 thinc 8.1.10 tinycss2 1.1.1 tokenizers 0.13.2 torch 1.13.1 torchelastic 0.2.0 torchtext 0.11.0 torchvision 0.11.1 tornado 6.1 tqdm 4.64.1 traitlets 5.1.0 transformers 4.25.1 typer 0.9.0 typing-extensions 4.4.0 urllib3 1.26.6 wasabi 1.1.2 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.4.2 Werkzeug 2.2.3 wheel 0.36.2 xxhash 3.1.0 yarl 1.8.2 zipp 3.8.0
Hi, I don't maintain this code. Please contact the authors of the paper for support: https://github.com/huseyinatahaninan/Differentially-Private-Fine-tuning-of-Language-Models/tree/main/Language-Generation-GPT-2
I ran your command provided in the readme. But got some error.
And here is the package I installed