Open chintanckg opened 8 months ago
Gently brining this to notice of @dacorvo and @JingyaHuang
Thanks @chintanckg, thanks for raising the issue. Did both the compilation of encoder and decoder failed(I saw the encoder failed but can't see the status of decoder). Will try to reproduce on my end and keep you posted.
I did not capture that; I will run it again and see if decoder fails as well.
@JingyaHuang : Decoder fails as well. One time it failed abrubtly, when I tried again it failed with below logs:
2024-02-15T16:33:27Z Running coloring_allocator_dram
2024-02-15T16:33:40Z coloring_allocator_dram finished after 12.772 seconds
2024-02-15T16:33:40Z Running address_rotation_dram
2024-02-15T16:33:44Z address_rotation_dram finished after 3.463 seconds
2024-02-15T16:33:44Z Running tensorcopy_accel
2024-02-15T16:33:44Z tensorcopy_accel finished after 0.329 seconds
2024-02-15T16:33:44Z Running peephole_opts
2024-02-15T16:33:45Z peephole_opts finished after 0.711 seconds
2024-02-15T16:33:45Z Running lower_kernel
2024-02-15T16:33:46Z lower_kernel finished after 0.355 seconds
2024-02-15T16:33:46Z Running build_fdeps
[F139] neuronx-cc terminated abnormally - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new
2024-02-15T16:33:50Z
2024-02-15T16:33:50Z Diagnostic information:
2024-02-15T16:33:50Z NeuronX Compiler version 2.12.68.0+4480452af
2024-02-15T16:33:50Z
2024-02-15T16:33:50Z Python version 3.8.10
2024-02-15T16:33:50Z HWM version 2.12.0.0-422c9037c
2024-02-15T16:33:50Z NumPy version 1.24.4
2024-02-15T16:33:50Z
2024-02-15T16:33:50Z Running on AMI ami-074c1cd522c5ede73
2024-02-15T16:33:50Z Running in region euw1-az2
2024-02-15T16:33:50Z
2024-02-15T16:33:50Z Diagnostic logs stored in /home/ubuntu/log-neuron-cc.txt
An error occured when trying to trace encoder with the error message: neuronx-cc failed with 70.
The export is failed and encoder neuron model won't be stored.
***** Compiling decoder *****
/home/ubuntu/.local/lib/python3.8/site-packages/torch_neuronx/xla_impl/hlo_conversion.py:143: UserWarning: Received an input tensor that was unused. Tensor will be ignored. (index=2, shape=torch.Size([1, 2048, 4096]), dtype=torch.float32)
warnings.warn(
/home/ubuntu/.local/lib/python3.8/site-packages/torch_neuronx/xla_impl/hlo_conversion.py:143: UserWarning: Received an input tensor that was unused. Tensor will be ignored. (index=4, shape=torch.Size([1]), dtype=torch.int64)
warnings.warn(
/home/ubuntu/.local/lib/python3.8/site-packages/torch_neuronx/xla_impl/hlo_conversion.py:143: UserWarning: Received an input tensor that was unused. Tensor will be ignored. (index=5, shape=torch.Size([1]), dtype=torch.float32)
@JingyaHuang -- A gentle reminder!
[Updated]
Hi @chintanckg, I can reproduce the error, the compilation could have been killed on inf2.xlarge
or inf2.8xlarge
(the one I used during my 1st experiment) since Flan-UL2
is large, you might not have enough memory for the compilation or enough space on your disk to write the model. (also tested with inf2.24xlarge
the compilation passed, you don't need inf2 instance for compiling a model for neuron, a CPU-only instance will be fine)
But anyway, even the compilation is successful, to run a large model like that on Neuron devices, we need to leverage the tensor parallelism to make it happen (with fixed tensor parallelism degree during the compilation). For that, we need to leverage the neuronx_distributed
(example for t5 here), which is not yet integrated for the inference of T5 but on our roadmap.
Paste some logs below during my experiment on inf2.24xlarge:
Thank you for the update @JingyaHuang.
Hi @chintanckg, I updated my previous comment with more details. Will check if we can prioritize the integration of TP for t5 as well.
@JingyaHuang -- Please advise on below query:
In the latest announcement from Philipp, inferentia-2 instances can now host 70b parameter models like llama-70b! Does this mean it will support 20b models like Flan-UL2 inherently; please advise.
@JingyaHuang -- A gentle reminder.
Hi @chintanckg I'm waiting for the feedback from the Annapurna team on this as well: https://github.com/aws-neuron/aws-neuron-sdk/issues/851
Thanks @JingyaHuang !
System Info
Who can help?
@JingyaHuang
Error Logs:
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
optimum-cli export neuron -m google/flan-ul2 --batch_size 4 --sequence_length 2048 --auto_cast_type bf16 ./flan-ul2
Expected behavior
Successful Compilation of Flan-UL2 with 2048 context length