Run inference from a generated checkpoint: USE_FAST=1 CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=1 run_generate.py --pp 1 --tp 1 --ckpt-path /fsx/phuc/testing/nanotron-ref/checkpoints/1550
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 337.62it/s]
Traceback (most recent call last):
File "/fsx/phuc/testing/nanotron-ref/nanotron/run_generate.py", line 208, in <module>
main()
File "/fsx/phuc/testing/nanotron-ref/nanotron/run_generate.py", line 188, in main
for output in outputs:
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/fsx/phuc/testing/nanotron-ref/nanotron/src/nanotron/generate/generation.py", line 234, in greedy_search_text
sharded_logits = model(
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/fsx/phuc/testing/nanotron-ref/nanotron/src/nanotron/models/fast/llama.py", line 687, in forward
return self.forward_with_hidden_states(input_ids=input_ids, input_mask=input_mask)[0]
File "/fsx/phuc/testing/nanotron-ref/nanotron/src/nanotron/models/fast/llama.py", line 703, in forward_with_hidden_states
hidden_encoder_states = encoder_block(**hidden_encoder_states)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/fsx/phuc/testing/nanotron-ref/nanotron/src/nanotron/core/parallel/pipeline_parallelism/block.py", line 150, in forward
output = self.pp_block(**new_kwargs)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/fsx/phuc/testing/nanotron-ref/nanotron/src/nanotron/models/fast/llama.py", line 553, in forward
output = self.attn(hidden_states=hidden_states, sequence_mask=sequence_mask)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/fsx/phuc/testing/nanotron-ref/nanotron/src/nanotron/models/fast/llama.py", line 387, in forward
query_states = self.rotary_embedding(query_states, position_ids=position_ids)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/fsx/phuc/testing/nanotron-ref/nanotron/src/nanotron/models/fast/llama.py", line 114, in forward
raise ValueError(
ValueError: Position ids must be in the range [0, 32), but got 32 and 32
[2024-01-09 04:35:30,428] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 945819) of binary: /admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/bin/python
Traceback (most recent call last):
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/admin/home/phuc_nguyen/miniconda3/envs/nanotron-ref/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
run_generate.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-01-09_04:35:30
host : ip-26-0-161-138.ec2.internal
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 945819)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Reproduce
USE_FAST=1 CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --rdzv-backend=c10d --nproc_per_node=4 run_train.py --config-file examples/config_tiny_llama.yaml
USE_FAST=1 CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=1 run_generate.py --pp 1 --tp 1 --ckpt-path /fsx/phuc/testing/nanotron-ref/checkpoints/1550