Closed taeyeonlee closed 1 month ago
Dear @fwtan, According to https://github.com/saic-fi/MobileQuant/blob/main/device/README.md, it fails when running CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json
What do i have to do ?
(qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:~/saic_llama/MobileQuant$ CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json 2024-09-07 17:32:07.485789: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0
. 2024-09-07 17:32:07.494071: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-07 17:32:07.503819: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-07 17:32:07.506685: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-07 17:32:07.513959: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-07 17:32:07.950331: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-09-07 17:32:08,871 - root - INFO - AIMET Traceback (most recent call last): File "/home/taeyeon/saic_llama/MobileQuant/device/calibrate.py", line 306, in main() File "/home/taeyeon/saic_llama/MobileQuant/device/calibrate.py", line 110, in main dataset = load_dataset("json", data_files=args.calib_path, split="train") File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 2592, in load_dataset builder_instance = load_dataset_builder( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 2264, in load_dataset_builder dataset_module = dataset_module_factory( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 1804, in dataset_module_factory ).get_module() File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 1141, in get_module data_files = DataFilesDict.from_patterns( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 715, in from_patterns DataFilesList.from_patterns( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 620, in from_patterns resolve_pattern( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 407, in resolve_pattern raise FileNotFoundError(error_msg) FileNotFoundError: Unable to find '/home/taeyeon/saic_llama/MobileQuant/data/pile/val.jsonl.zst'Best regards,
The original pile dataset has been taken down due to copyright issues. You can try to download the filtered version: https://huggingface.co/datasets/monology/pile-uncopyrighted/blob/main/val.jsonl.zst, and put it under the proper directory: data/pile/val.jsonl.zst
Dear @fwtan Using the https://huggingface.co/datasets/monology/pile-uncopyrighted/blob/main/val.jsonl.zst for the data/pile/val.jsonl.zst, it made progress, but eventually killed. What do I have to do ?
CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json
(qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:/saic_llama/MobileQuant$ CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json
2024-09-11 02:02:45.000757: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-09-11 02:02:45.159003: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-11 02:02:45.221713: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-11 02:02:45.239612: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-11 02:02:45.357085: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-11 02:02:46.052904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-09-11 02:02:47,511 - root - INFO - AIMET
Dear @fwtan Using the https://huggingface.co/datasets/monology/pile-uncopyrighted/blob/main/val.jsonl.zst for the data/pile/val.jsonl.zst, it made progress, but eventually killed. What do I have to do ?
CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json
(qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:/saic_llama/MobileQuant$ CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json 2024-09-11 02:02:45.000757: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0
. 2024-09-11 02:02:45.159003: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-11 02:02:45.221713: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-11 02:02:45.239612: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-11 02:02:45.357085: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-11 02:02:46.052904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-09-11 02:02:47,511 - root - INFO - AIMET 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:06<00:00, 81.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [20:35<00:00, 2.41s/it] Exporting the ctx onnx/encodings Killed (qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:~/saic_llama/MobileQuant$Best regards,
There is a chance that more RAM is required, e.g. at least 32GB for llama-1.1b, ideally >= 64B
Dear @fwtan
It was run on the google colab pro+ ( RAM 51 GB , GPU RAM 15 GB, Storage 235 GB ) and my PC ( RAM 32 GB , GPU RAM 12 GB, Storage 1TB ) It uses the RAM 25GB, GPU RAM 6.6 GB, Storage 64 GB at Max. Both stopped (Killed) without any error message. It stops at torch.onnx.export(model, dummy_input, temp_file, **kwargs) What should I do ?
def _export_model_to_onnx(model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction],
dummy_input: Union[Tuple[Any, ...], torch.Tensor], temp_file: str, is_conditional: bool,
onnx_export_args: Union[OnnxExportApiArgs, dict]):
"""
if version.parse(torch.__version__) < version.parse('1.11.0'):
kwargs.update({'enable_onnx_checker': False})
torch.onnx.export(model, dummy_input, temp_file, **kwargs)
else:
try:
remove_kwargs = ['enable_onnx_checker', 'example_outputs', 'use_external_data_format']
for key in remove_kwargs:
kwargs.pop(key, None)
torch.onnx.export(model, dummy_input, temp_file, **kwargs)
Best regards,
Dear @fwtan
It was run on the google colab pro+ ( RAM 51 GB , GPU RAM 15 GB, Storage 235 GB ) and my PC ( RAM 32 GB , GPU RAM 12 GB, Storage 1TB ) It uses the RAM 25GB, GPU RAM 6.6 GB, Storage 64 GB at Max. Both stopped (Killed) without any error message. It stops at torch.onnx.export(model, dummy_input, temp_file, **kwargs) What should I do ?
def _export_model_to_onnx(model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction], dummy_input: Union[Tuple[Any, ...], torch.Tensor], temp_file: str, is_conditional: bool, onnx_export_args: Union[OnnxExportApiArgs, dict]): """ if version.parse(torch.__version__) < version.parse('1.11.0'): kwargs.update({'enable_onnx_checker': False}) torch.onnx.export(model, dummy_input, temp_file, **kwargs) else: try: remove_kwargs = ['enable_onnx_checker', 'example_outputs', 'use_external_data_format'] for key in remove_kwargs: kwargs.pop(key, None) torch.onnx.export(model, dummy_input, temp_file, **kwargs)
Best regards,
Typically the program got killed because it ran out of the RAM. There are ways to debug:
--num_blocks=1
, to see if the program could export the onnx file for one block. /dev/shm
set to beFeel free to reopen the issue if there are any further questions.
Dear @fwtan, According to https://github.com/saic-fi/MobileQuant/blob/main/device/README.md, it fails when running CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json
What do i have to do ?
(qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:~/saic_llama/MobileQuant$ CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json
2024-09-07 17:32:07.485789: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0
. 2024-09-07 17:32:07.494071: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-07 17:32:07.503819: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-07 17:32:07.506685: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-07 17:32:07.513959: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-07 17:32:07.950331: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-09-07 17:32:08,871 - root - INFO - AIMET