saic-fi / MobileQuant

[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models
Other
41 stars 4 forks source link

Unable to find '/home/taeyeon/saic_llama/MobileQuant/data/pile/val.jsonl.zst' #4

Closed taeyeonlee closed 1 month ago

taeyeonlee commented 2 months ago

Dear @fwtan, According to https://github.com/saic-fi/MobileQuant/blob/main/device/README.md, it fails when running CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json

What do i have to do ?

(qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:~/saic_llama/MobileQuant$ CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json
2024-09-07 17:32:07.485789: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-09-07 17:32:07.494071: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-07 17:32:07.503819: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-07 17:32:07.506685: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-07 17:32:07.513959: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-07 17:32:07.950331: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-09-07 17:32:08,871 - root - INFO - AIMET

Traceback (most recent call last): File "/home/taeyeon/saic_llama/MobileQuant/device/calibrate.py", line 306, in main() File "/home/taeyeon/saic_llama/MobileQuant/device/calibrate.py", line 110, in main dataset = load_dataset("json", data_files=args.calib_path, split="train") File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 2592, in load_dataset builder_instance = load_dataset_builder( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 2264, in load_dataset_builder dataset_module = dataset_module_factory( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 1804, in dataset_module_factory ).get_module() File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 1141, in get_module data_files = DataFilesDict.from_patterns( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 715, in from_patterns DataFilesList.from_patterns( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 620, in from_patterns resolve_pattern( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 407, in resolve_pattern raise FileNotFoundError(error_msg) FileNotFoundError: Unable to find '/home/taeyeon/saic_llama/MobileQuant/data/pile/val.jsonl.zst' Best regards,
fwtan commented 2 months ago

Dear @fwtan, According to https://github.com/saic-fi/MobileQuant/blob/main/device/README.md, it fails when running CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json

What do i have to do ?

(qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:~/saic_llama/MobileQuant$ CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json 2024-09-07 17:32:07.485789: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-09-07 17:32:07.494071: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-07 17:32:07.503819: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-07 17:32:07.506685: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-07 17:32:07.513959: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-07 17:32:07.950331: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-09-07 17:32:08,871 - root - INFO - AIMET Traceback (most recent call last): File "/home/taeyeon/saic_llama/MobileQuant/device/calibrate.py", line 306, in main() File "/home/taeyeon/saic_llama/MobileQuant/device/calibrate.py", line 110, in main dataset = load_dataset("json", data_files=args.calib_path, split="train") File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 2592, in load_dataset builder_instance = load_dataset_builder( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 2264, in load_dataset_builder dataset_module = dataset_module_factory( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 1804, in dataset_module_factory ).get_module() File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/load.py", line 1141, in get_module data_files = DataFilesDict.from_patterns( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 715, in from_patterns DataFilesList.from_patterns( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 620, in from_patterns resolve_pattern( File "/home/taeyeon/qct_python310_VENV_root/lib/python3.10/site-packages/datasets/data_files.py", line 407, in resolve_pattern raise FileNotFoundError(error_msg) FileNotFoundError: Unable to find '/home/taeyeon/saic_llama/MobileQuant/data/pile/val.jsonl.zst'

Best regards,

The original pile dataset has been taken down due to copyright issues. You can try to download the filtered version: https://huggingface.co/datasets/monology/pile-uncopyrighted/blob/main/val.jsonl.zst, and put it under the proper directory: data/pile/val.jsonl.zst

taeyeonlee commented 2 months ago

Dear @fwtan Using the https://huggingface.co/datasets/monology/pile-uncopyrighted/blob/main/val.jsonl.zst for the data/pile/val.jsonl.zst, it made progress, but eventually killed. What do I have to do ?

CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json

(qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:/saic_llama/MobileQuant$ CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json 2024-09-11 02:02:45.000757: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-09-11 02:02:45.159003: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-11 02:02:45.221713: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-11 02:02:45.239612: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-11 02:02:45.357085: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-11 02:02:46.052904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-09-11 02:02:47,511 - root - INFO - AIMET

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:06<00:00, 81.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [20:35<00:00, 2.41s/it] Exporting the ctx onnx/encodings Killed (qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:~/saic_llama/MobileQuant$ Best regards,
fwtan commented 2 months ago

Dear @fwtan Using the https://huggingface.co/datasets/monology/pile-uncopyrighted/blob/main/val.jsonl.zst for the data/pile/val.jsonl.zst, it made progress, but eventually killed. What do I have to do ?

CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json

(qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:/saic_llama/MobileQuant$ CUDA_VISIBLE_DEVICES=0 python device/calibrate.py --hf_path ${HF_PATH} --per_channel --use_conv --weight_bitwidth 4 --act_dict_path ${HF_PATH}/act_dict.json 2024-09-11 02:02:45.000757: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-09-11 02:02:45.159003: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-11 02:02:45.221713: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-11 02:02:45.239612: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-11 02:02:45.357085: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-11 02:02:46.052904: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-09-11 02:02:47,511 - root - INFO - AIMET 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:06<00:00, 81.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [20:35<00:00, 2.41s/it] Exporting the ctx onnx/encodings Killed (qct_python310_VENV_root) (base) taeyeon@taeyeon-H610MH:~/saic_llama/MobileQuant$

Best regards,

There is a chance that more RAM is required, e.g. at least 32GB for llama-1.1b, ideally >= 64B

taeyeonlee commented 2 months ago

Dear @fwtan

It was run on the google colab pro+ ( RAM 51 GB , GPU RAM 15 GB, Storage 235 GB ) and my PC ( RAM 32 GB , GPU RAM 12 GB, Storage 1TB ) It uses the RAM 25GB, GPU RAM 6.6 GB, Storage 64 GB at Max. Both stopped (Killed) without any error message. It stops at torch.onnx.export(model, dummy_input, temp_file, **kwargs) What should I do ?


def _export_model_to_onnx(model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction],
                              dummy_input: Union[Tuple[Any, ...], torch.Tensor], temp_file: str, is_conditional: bool,
                              onnx_export_args: Union[OnnxExportApiArgs, dict]):
        """
        if version.parse(torch.__version__) < version.parse('1.11.0'):
            kwargs.update({'enable_onnx_checker': False})
            torch.onnx.export(model, dummy_input, temp_file, **kwargs)

        else:
            try:
                remove_kwargs = ['enable_onnx_checker', 'example_outputs', 'use_external_data_format']
                for key in remove_kwargs:
                    kwargs.pop(key, None)
                torch.onnx.export(model, dummy_input, temp_file, **kwargs)

Best regards,

fwtan commented 2 months ago

Dear @fwtan

It was run on the google colab pro+ ( RAM 51 GB , GPU RAM 15 GB, Storage 235 GB ) and my PC ( RAM 32 GB , GPU RAM 12 GB, Storage 1TB ) It uses the RAM 25GB, GPU RAM 6.6 GB, Storage 64 GB at Max. Both stopped (Killed) without any error message. It stops at torch.onnx.export(model, dummy_input, temp_file, **kwargs) What should I do ?


def _export_model_to_onnx(model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction],
                              dummy_input: Union[Tuple[Any, ...], torch.Tensor], temp_file: str, is_conditional: bool,
                              onnx_export_args: Union[OnnxExportApiArgs, dict]):
        """
        if version.parse(torch.__version__) < version.parse('1.11.0'):
            kwargs.update({'enable_onnx_checker': False})
            torch.onnx.export(model, dummy_input, temp_file, **kwargs)

        else:
            try:
                remove_kwargs = ['enable_onnx_checker', 'example_outputs', 'use_external_data_format']
                for key in remove_kwargs:
                    kwargs.pop(key, None)
                torch.onnx.export(model, dummy_input, temp_file, **kwargs)

Best regards,

Typically the program got killed because it ran out of the RAM. There are ways to debug:

fwtan commented 1 month ago

Feel free to reopen the issue if there are any further questions.