microsoft / analysing_pii_leakage

The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word prediction language models.
MIT License
80 stars 19 forks source link

error when :deepspeed --num_gpus=8 fine_tune.py --config_path ../configs/fine-tune/echr-gpt2-small-dp8.yml #7

Closed Ethan-Chen-plus closed 10 months ago

Ethan-Chen-plus commented 1 year ago

image image

Traceback (most recent call last):
File "/root/analysing_pii_leakage_ms/examples/fine_tune.py", line 89, in
fine_tune(*parse_args())
File "/root/analysing_pii_leakage_ms/examples/fine_tune.py", line 80, in fine_tune
for step, inputs in enumerate(epoch_iterator):
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in next
lm.fine_tune(train_dataset, eval_dataset, train_args, privacy_args)
File "/root/analysing_pii_leakage/src/pii_leakage/models/language_model.py", line 285, in fine_tune return self._fine_tune_dp(train_dataset, eval_dataset, train_args, privacy_args)data = self._n ext_data()

File "/root/analysing_pii_leakage/src/pii_leakage/models/language_model.py", line 268, in _fine_tune _dp
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 721, in _next_data
trainer.train()
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1645, i n train
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", lin e 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", lin e 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2343, in getitem

data = [self.dataset[idx] for idx in possibly_batched_index]                              [0/1820]

File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", lin e 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2343, in getitem return self._getitem( File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2327, in _getitem pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None) File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/formatting/formatting.py", li ne 463, in query_table _check_valid_index_key(key, size) File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/formatting/formatting.py", li ne 406, in _check_valid_index_key raise IndexError(f"Invalid key: {key} is out of bounds for size {size}") IndexError: Invalid key: 11618 is out of bounds for size 0 [2023-06-26 22:59:34,485] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184079 [2023-06-26 22:59:34,923] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184080 [2023-06-26 22:59:34,937] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184081 [2023-06-26 22:59:34,938] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184082 [2023-06-26 22:59:34,949] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184083 [2023-06-26 22:59:34,958] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184084 [2023-06-26 22:59:34,968] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184085 [2023-06-26 22:59:34,978] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184086 [2023-06-26 22:59:34,988] [ERROR] [launch.py:434:sigkill_handler] ['/opt/micromamba/envs/py310/bin/pyt hon3.10', '-u', 'fine_tune.py', '--local_rank=7', '--config_path', '../configs/fine-tune/echr-gpt2-sma ll-dp8.yml'] exits with return code = 1

s-zanella commented 10 months ago

Opacus doesn't work out of the box with DeepSpeed (or FSDP). Opacus does support DDP, but you would still need for the model to fit in each individual GPU. Furthermore, the code needs to be adapted to use DDP through dp_transformers. See the args.parallel_model argument in https://github.com/microsoft/dp-transformers/blob/main/src/dp_transformers/dp_utils.py#L171.