huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
194 stars 59 forks source link

optimum neuron optimum-cli compile fails in newest TGI Neuron #698

Open jimburtoft opened 4 days ago

jimburtoft commented 4 days ago

System Info

!docker run -p 8080:80 \
-v $(pwd):/data \
--device=/dev/neuron0 \
--device=/dev/neuron1 \
--device=/dev/neuron2 \
--device=/dev/neuron3 \
--device=/dev/neuron4 \
--device=/dev/neuron5 \
-ti \
--entrypoint "optimum-cli" neuronx-tgi:latest \
env

Platform:

- Platform: Linux-5.15.0-1056-aws-x86_64-with-glibc2.35
- Python version: 3.10.12

Python packages:

- `optimum-neuron` version: 0.0.25.dev0
- `neuron-sdk` version: 2.20.0
- `optimum` version: 1.21.4
- `transformers` version: 4.43.2
- `huggingface_hub` version: 0.25.0
- `torch` version: 2.1.2+cu121
- `aws-neuronx-runtime-discovery` version: 2.9
- `libneuronxla` version: 2.0.4115.0
- `neuronx-cc` version: 2.15.128.0+56dc5a86
- `neuronx-distributed` version: NA
- `neuronx-hwm` version: NA
- `torch-neuronx` version: 2.1.2.2.3.0
- `torch-xla` version: 2.1.4
- `transformers-neuronx` version: 0.12.313

Neuron Driver:

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]

Who can help?

@dacorvo

Information

Tasks

Reproduction (minimal, reproducible, runnable)

When I use the newly merged TGI image to compile with optimum-cli, I get an error message.

I haven't been able to test it without TGI because I am having trouble upgrading my image to 2.20.

!git clone https://github.com/huggingface/optimum-neuron.git && cd optimum-neuron && make neuronx-tgi

REPOSITORY    TAG           IMAGE ID       CREATED        SIZE
neuronx-tgi   0.0.25.dev0   165727a580ea   14 hours ago   11.6GB
neuronx-tgi   latest        165727a580ea   14 hours ago   11.6GB
nginx         alpine        c7b4f26a7d93   5 weeks ago    43.2MB
!docker run -p 8080:80 \
-v $(pwd):/data \
--device=/dev/neuron0 \
--device=/dev/neuron1 \
--device=/dev/neuron2 \
--device=/dev/neuron3 \
--device=/dev/neuron4 \
--device=/dev/neuron5 \
-ti \
--entrypoint "optimum-cli" neuronx-tgi:latest \
export neuron --model NousResearch/Llama-2-7b-chat-hf \
--sequence_length 4096 \
--batch_size 4 \
--num_cores 8 \
/data/exportedmodel/

Error:

Downloading shards: 100%|█████████████████████████| 2/2 [00:11<00:00,  5.81s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:00<00:00,  7.93it/s]
generation_config.json: 100%|██████████████████| 200/200 [00:00<00:00, 2.17MB/s]
2024-09-20 13:33:14.000539:  136  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-20 13:33:37.000915:  766  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-20 13:33:37.000945:  767  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-20 13:33:38.000478:  766  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-20 13:33:38.000848:  767  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
................
2024-09-20 13:36:05.000005:  766  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-20T13:35:59Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-20 13:36:05.000005:  766  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb after 0 retries.
2024-09-20 13:36:05.000006:  766  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
...
2024-09-20 13:37:22.000177:  767  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-20T13:37:16Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-20 13:37:22.000177:  767  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb after 0 retries.
2024-09-20 13:37:22.000178:  767  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 500, in compile
    self.build(num_exec_repetition)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 507, in build
    self.neff_bytes = compile_hlo_module(self.hlo_module, self.tag, num_exec_repetition)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 144, in compile_hlo_module
    neff_bytes = neuron_xla_compile(module_bytes, flags, input_format="hlo", platform_target="trn1",
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 210, in neuron_xla_compile
    neuron_xla_compile_impl(
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 269, in neuron_xla_compile_impl
    return compile_cache_entry(output, entry, execution_mode,
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 186, in compile_cache_entry
    raise (e)
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 165, in compile_cache_entry
    ret = call_neuron_compiler(
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 109, in call_neuron_compiler
    raise subprocess.CalledProcessError(res.returncode, cmd, stderr=error_info)
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 737, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 690, in main
    decoder_export(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 655, in decoder_export
    model = NeuronModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/optimum/modeling_base.py", line 420, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 331, in _from_transformers
    return cls._export(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 382, in _export
    return cls(new_config, checkpoint_dir, generation_config=generation_config)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling.py", line 1254, in __init__
    super().__init__(config, checkpoint_dir, compiled_dir=compiled_dir, generation_config=generation_config)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 215, in __init__
    neuronx_model.to_neuron()
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 85, in to_neuron
    self.compile()
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 64, in compile
    kernel.neff_bytes = neff_bytes_futures[hash_hlo(kernel.hlo_module)].result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
Traceback (most recent call last):
  File "/usr/local/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/neuronx.py", line 298, in run
    subprocess.run(full_command, shell=True, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model NousResearch/Llama-2-7b-chat-hf --sequence_length 4096 --batch_size 4 --num_cores 8 /data/exportedmodel/' returned non-zero exit status 1.

Expected behavior

I expect the command to successfully compile

jimburtoft commented 3 days ago

I tested from the commandline on the HF DLAMI and optimum-cli works to compile no problem. However, when I connect into the CLI on the TGI image:

docker run --privileged --entrypoint "/bin/bash" -it neuronx-tgi:latest
root@076ec64e0835:/# optimum-cli export neuron --model NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO \
                          --batch_size 1 --sequence_length 2048 \
                          --num_cores 8 \
                          NousResearch/Llama-2-7b-chat-hf

I get the same error as calling it as a docker run command:

model-00019-of-00019.safetensors: 100%|████████████████████████████████████████████████████████████████▉| 4.22G/4.22G [00:31<00:00, 133MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████| 19/19 [03:39<00:00, 11.53s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 19/19 [00:02<00:00,  8.38it/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 120/120 [00:00<00:00, 1.49MB/s]
2024-09-21 16:11:36.000245:  139  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000155:  4812  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000227:  4813  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000325:  4814  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000425:  4815  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_77f3c929dd68c1f98ca9+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000527:  4816  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_77f3c929dd68c1f98ca9+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000595:  4817  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_3cbe1c7ce93617c15590+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_5ac5ea8b15390a9a05af+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_77f3c929dd68c1f98ca9+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:27.000693:  4812  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000717:  4818  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_5ac5ea8b15390a9a05af+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3cbe1c7ce93617c15590+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000798:  4819  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_5ac5ea8b15390a9a05af+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:27.000835:  4814  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_91b794a5c1f78db2b445+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3cbe1c7ce93617c15590+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:27.000909:  4813  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000931:  4820  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000964:  4821  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_91b794a5c1f78db2b445+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_a7b0021f88c35fba5c31+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_91b794a5c1f78db2b445+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000060:  4815  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_98ba4c9111d5882d2331+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_dbd562f4987f424d68b7+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_a7b0021f88c35fba5c31+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_98ba4c9111d5882d2331+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_cafd6eb1a22bbd8d668b+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_a7b0021f88c35fba5c31+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000232:  4816  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_dbd562f4987f424d68b7+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_98ba4c9111d5882d2331+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000263:  4817  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_cafd6eb1a22bbd8d668b+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3c1052e2b1f1e6b7ec77+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_dbd562f4987f424d68b7+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000334:  4819  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_cafd6eb1a22bbd8d668b+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000371:  4818  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_182ac3459c92e88f4e9d+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3c1052e2b1f1e6b7ec77+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_182ac3459c92e88f4e9d+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3c1052e2b1f1e6b7ec77+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000524:  4821  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_182ac3459c92e88f4e9d+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000570:  4820  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
......................................................................

2024-09-21 16:17:37.000815:  4816  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:37.000816:  4814  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:37.000816:  4814  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:37.000816:  4816  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:37.000816:  4814  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:37.000816:  4816  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:37.000862:  4813  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:37.000862:  4813  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:37.000862:  4813  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:37.000873:  4817  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:37.000873:  4817  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:37.000913:  4817  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:38.000036:  4815  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:38.000036:  4815  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:38.000036:  4815  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:38.000037:  4812  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:38.000037:  4812  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:38.000038:  4812  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:38.000265:  4818  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:33Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:38.000265:  4818  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:38.000266:  4818  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache

2024-09-21 16:17:45.000980:  4820  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:40Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:45.000981:  4820  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:45.000981:  4820  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:47.000129:  4819  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:41Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:17:47.000129:  4819  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:47.000130:  4819  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
.
2024-09-21 16:18:03.000204:  4821  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:58Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-21 16:18:03.000205:  4821  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:18:03.000205:  4821  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 500, in compile
    self.build(num_exec_repetition)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 507, in build
    self.neff_bytes = compile_hlo_module(self.hlo_module, self.tag, num_exec_repetition)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 144, in compile_hlo_module
    neff_bytes = neuron_xla_compile(module_bytes, flags, input_format="hlo", platform_target="trn1",
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 210, in neuron_xla_compile
    neuron_xla_compile_impl(
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 269, in neuron_xla_compile_impl
    return compile_cache_entry(output, entry, execution_mode,
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 186, in compile_cache_entry
    raise (e)
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 165, in compile_cache_entry
    ret = call_neuron_compiler(
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 109, in call_neuron_compiler
    raise subprocess.CalledProcessError(res.returncode, cmd, stderr=error_info)
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 737, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 690, in main
    decoder_export(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 655, in decoder_export
    model = NeuronModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/optimum/modeling_base.py", line 420, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 331, in _from_transformers
    return cls._export(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 382, in _export
    return cls(new_config, checkpoint_dir, generation_config=generation_config)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling.py", line 1254, in __init__
    super().__init__(config, checkpoint_dir, compiled_dir=compiled_dir, generation_config=generation_config)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 215, in __init__
    neuronx_model.to_neuron()
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 85, in to_neuron
    self.compile()
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 64, in compile
    kernel.neff_bytes = neff_bytes_futures[hash_hlo(kernel.hlo_module)].result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
Traceback (most recent call last):
  File "/usr/local/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/neuronx.py", line 298, in run
    subprocess.run(full_command, shell=True, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO --batch_size 1 --sequence_length 2048 --num_cores 8 NousResearch/Llama-2-7b-chat-hf' returned non-zero exit status 1.
root@076ec64e0835:/#
jimburtoft commented 3 days ago

However, I notice that I also see this SyntaxWarnign about str format compiler logs that I DON'T see when I run it without using TGI:

/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
dacorvo commented 1 day ago

I can reproduce the issue: the neuronx-tgi docker image is missing the libxml2 package, which seems to be required (but not pulled automatically) by the neuronx compiler.