Open jimburtoft opened 4 days ago
I tested from the commandline on the HF DLAMI and optimum-cli works to compile no problem. However, when I connect into the CLI on the TGI image:
docker run --privileged --entrypoint "/bin/bash" -it neuronx-tgi:latest
root@076ec64e0835:/# optimum-cli export neuron --model NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO \
--batch_size 1 --sequence_length 2048 \
--num_cores 8 \
NousResearch/Llama-2-7b-chat-hf
I get the same error as calling it as a docker run command:
model-00019-of-00019.safetensors: 100%|████████████████████████████████████████████████████████████████▉| 4.22G/4.22G [00:31<00:00, 133MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████| 19/19 [03:39<00:00, 11.53s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 19/19 [00:02<00:00, 8.38it/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 120/120 [00:00<00:00, 1.49MB/s]
2024-09-21 16:11:36.000245: 139 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000155: 4812 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000227: 4813 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000325: 4814 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000425: 4815 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_77f3c929dd68c1f98ca9+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000527: 4816 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_77f3c929dd68c1f98ca9+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000595: 4817 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_3cbe1c7ce93617c15590+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_5ac5ea8b15390a9a05af+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_77f3c929dd68c1f98ca9+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:27.000693: 4812 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000717: 4818 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_5ac5ea8b15390a9a05af+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3cbe1c7ce93617c15590+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000798: 4819 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_5ac5ea8b15390a9a05af+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:27.000835: 4814 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_91b794a5c1f78db2b445+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3cbe1c7ce93617c15590+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:27.000909: 4813 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000931: 4820 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-21 16:15:27.000964: 4821 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_91b794a5c1f78db2b445+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_a7b0021f88c35fba5c31+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_91b794a5c1f78db2b445+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000060: 4815 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_98ba4c9111d5882d2331+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_dbd562f4987f424d68b7+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_a7b0021f88c35fba5c31+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_98ba4c9111d5882d2331+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_cafd6eb1a22bbd8d668b+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_a7b0021f88c35fba5c31+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000232: 4816 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_dbd562f4987f424d68b7+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_98ba4c9111d5882d2331+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000263: 4817 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_cafd6eb1a22bbd8d668b+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3c1052e2b1f1e6b7ec77+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_dbd562f4987f424d68b7+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000334: 4819 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_cafd6eb1a22bbd8d668b+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000371: 4818 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_182ac3459c92e88f4e9d+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3c1052e2b1f1e6b7ec77+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_182ac3459c92e88f4e9d+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_3c1052e2b1f1e6b7ec77+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000524: 4821 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_182ac3459c92e88f4e9d+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-21 16:15:28.000570: 4820 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
......................................................................
2024-09-21 16:17:37.000815: 4816 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:37.000816: 4814 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:37.000816: 4814 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/ea446253-b0f0-47cc-9202-6d77d8d66026/model.MODULE_5ac5ea8b15390a9a05af+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:37.000816: 4816 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/e555d4da-7536-45da-be1a-6ddf6c3e776d/model.MODULE_a7b0021f88c35fba5c31+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:37.000816: 4814 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:37.000816: 4816 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:37.000862: 4813 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:37.000862: 4813 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/d77f0fc5-8007-4c95-afea-9d15bcd4b476/model.MODULE_3cbe1c7ce93617c15590+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:37.000862: 4813 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:37.000873: 4817 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:37.000873: 4817 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/43f043cb-dac4-4a9f-b37c-058c0a389e2c/model.MODULE_98ba4c9111d5882d2331+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:37.000913: 4817 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:38.000036: 4815 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:38.000036: 4815 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/1d70a34a-1529-4451-9939-1b7126d7eb9c/model.MODULE_91b794a5c1f78db2b445+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:38.000036: 4815 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:38.000037: 4812 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:32Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:38.000037: 4812 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:38.000038: 4812 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:38.000265: 4818 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:33Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:38.000265: 4818 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/039aa644-a298-41de-a0fe-4e41b97a3877/model.MODULE_cafd6eb1a22bbd8d668b+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:38.000266: 4818 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:45.000980: 4820 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:40Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:45.000981: 4820 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/ab715a7d-e9bd-474e-8f13-a2aa4ba39f4b/model.MODULE_182ac3459c92e88f4e9d+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:45.000981: 4820 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-09-21 16:17:47.000129: 4819 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:41Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:17:47.000129: 4819 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/82bbf148-f202-4b28-9d5f-6f3f09fe72d0/model.MODULE_dbd562f4987f424d68b7+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:17:47.000130: 4819 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
.
2024-09-21 16:18:03.000204: 4821 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-21T16:17:58Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-21 16:18:03.000205: 4821 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/c08f6aff-4516-4aef-8f97-b146e7cd2832/model.MODULE_3c1052e2b1f1e6b7ec77+39f12043.hlo_module.pb after 0 retries.
2024-09-21 16:18:03.000205: 4821 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 500, in compile
self.build(num_exec_repetition)
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 507, in build
self.neff_bytes = compile_hlo_module(self.hlo_module, self.tag, num_exec_repetition)
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 144, in compile_hlo_module
neff_bytes = neuron_xla_compile(module_bytes, flags, input_format="hlo", platform_target="trn1",
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 210, in neuron_xla_compile
neuron_xla_compile_impl(
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 269, in neuron_xla_compile_impl
return compile_cache_entry(output, entry, execution_mode,
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 186, in compile_cache_entry
raise (e)
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 165, in compile_cache_entry
ret = call_neuron_compiler(
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 109, in call_neuron_compiler
raise subprocess.CalledProcessError(res.returncode, cmd, stderr=error_info)
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 737, in <module>
main()
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 690, in main
decoder_export(
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 655, in decoder_export
model = NeuronModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/optimum/modeling_base.py", line 420, in from_pretrained
return from_pretrained_method(
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 331, in _from_transformers
return cls._export(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 382, in _export
return cls(new_config, checkpoint_dir, generation_config=generation_config)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling.py", line 1254, in __init__
super().__init__(config, checkpoint_dir, compiled_dir=compiled_dir, generation_config=generation_config)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 215, in __init__
neuronx_model.to_neuron()
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 85, in to_neuron
self.compile()
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 64, in compile
kernel.neff_bytes = neff_bytes_futures[hash_hlo(kernel.hlo_module)].result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/f2398503-7b61-4505-98ae-2b69377013c0/model.MODULE_77f3c929dd68c1f98ca9+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
Traceback (most recent call last):
File "/usr/local/bin/optimum-cli", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 208, in main
service.run()
File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/neuronx.py", line 298, in run
subprocess.run(full_command, shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO --batch_size 1 --sequence_length 2048 --num_cores 8 NousResearch/Llama-2-7b-chat-hf' returned non-zero exit status 1.
root@076ec64e0835:/#
However, I notice that I also see this SyntaxWarnign about str format compiler logs that I DON'T see when I run it without using TGI:
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
I can reproduce the issue: the neuronx-tgi docker image is missing the libxml2 package, which seems to be required (but not pulled automatically) by the neuronx compiler.
System Info
Who can help?
@dacorvo
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
When I use the newly merged TGI image to compile with optimum-cli, I get an error message.
I haven't been able to test it without TGI because I am having trouble upgrading my image to 2.20.
!git clone https://github.com/huggingface/optimum-neuron.git && cd optimum-neuron && make neuronx-tgi
Error:
Expected behavior
I expect the command to successfully compile