Closed Ethan-Chen-plus closed 10 months ago
Opacus doesn't work out of the box with DeepSpeed (or FSDP). Opacus does support DDP, but you would still need for the model to fit in each individual GPU. Furthermore, the code needs to be adapted to use DDP through dp_transformers
. See the args.parallel_model
argument in https://github.com/microsoft/dp-transformers/blob/main/src/dp_transformers/dp_utils.py#L171.
Traceback (most recent call last):
File "/root/analysing_pii_leakage_ms/examples/fine_tune.py", line 89, in
fine_tune(*parse_args())
File "/root/analysing_pii_leakage_ms/examples/fine_tune.py", line 80, in fine_tune
for step, inputs in enumerate(epoch_iterator):
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in next
lm.fine_tune(train_dataset, eval_dataset, train_args, privacy_args)
File "/root/analysing_pii_leakage/src/pii_leakage/models/language_model.py", line 285, in fine_tune return self._fine_tune_dp(train_dataset, eval_dataset, train_args, privacy_args)data = self._n ext_data()
File "/root/analysing_pii_leakage/src/pii_leakage/models/language_model.py", line 268, in _fine_tune _dp
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 721, in _next_data
trainer.train()
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1645, i n train
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", lin e 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", lin e 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2343, in getitem
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", lin e 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2343,
in getitem
return self._getitem(
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2327,
in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else
None)
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/formatting/formatting.py", li
ne 463, in query_table
_check_valid_index_key(key, size)
File "/opt/micromamba/envs/py310/lib/python3.10/site-packages/datasets/formatting/formatting.py", li
ne 406, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 11618 is out of bounds for size 0
[2023-06-26 22:59:34,485] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184079
[2023-06-26 22:59:34,923] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184080
[2023-06-26 22:59:34,937] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184081
[2023-06-26 22:59:34,938] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184082
[2023-06-26 22:59:34,949] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184083
[2023-06-26 22:59:34,958] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184084
[2023-06-26 22:59:34,968] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184085
[2023-06-26 22:59:34,978] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 184086
[2023-06-26 22:59:34,988] [ERROR] [launch.py:434:sigkill_handler] ['/opt/micromamba/envs/py310/bin/pyt
hon3.10', '-u', 'fine_tune.py', '--local_rank=7', '--config_path', '../configs/fine-tune/echr-gpt2-sma
ll-dp8.yml'] exits with return code = 1