Open swapnilsayansaha opened 1 year ago
Hi, it seems like an issue of floormod op on GPU rather than Pruning API's issue. It is weird since the similar bug is already fixed an year ago - https://github.com/tensorflow/tensorflow/issues/46887
Could you double check your tensorflow version? If it exists in recent tensorflow version, we may need to reopen the above issue.
Tf version is tf2.9.2 (GPU)
similar bug on win10 tf2.10.0 with floormod
Having the same problem on RTX 3090 with tensorflow 2.10. Can't even run PQAT because of the issue with pruning using GPU
I had the same problem. So I ran the following and got an error that libdevice.10.bc was not found.
@tf.function(jit_compile=True)
def floormod(a, b):
return tf.math.floormod(a, b)
floormod(tf.constant(1.), tf.constant(1.))
tensorflow.python.framework.errors_impl.InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_floormod_49]
I added the following to the top of the program and it worked.
import os
os.environ["XLA_FLAGS"]='--xla_gpu_cuda_data_dir=/path/to/cuda'
I hope this helps.
Prior to filing: check that this should be a bug instead of a feature request. Everything supported, including the compatible versions of TensorFlow, is listed in the overview page of each technique. For example, the overview page of quantization-aware training is here. An issue for anything not supported should be a feature request.
Describe the bug Most likely, sparse operators such as
PruneLowMagntiude
cannot be loaded and operated on on a GPU.System information
TensorFlow version (installed from source or binary): Binary, 2.9.2, CUDA: 11.6
GPU: Nvidia RTX 3090 24 GB
OS: Ubuntu 20.04
TensorFlow Model Optimization version (installed from source or binary): Binary, 0.7.3
Python version: 3.8
Describe the expected behavior and the current behavior Issue described here: https://github.com/tensorflow/tensorflow/issues/58499 https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras The example code for pruning should work as it is out of the box. However, fine-tuning the pruned model doesn't work on GPU. I made a workaround to solve it by forcing the fine-tuning of the pruned model on CPU:
The unpruned model can train fine on the GPU, it's not a problem with CUDA drivers, so please do not suggest reconfiguring a new conda/venv environment
The following error occurs without the
with tf.device('/cpu:0'):
:Code to reproduce the issue https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras
Screenshots If applicable, add screenshots to help explain your problem.
Additional context The problem isn't too serious as I can train the unpruned model on the GPU for for example 200 epochs, save its weights, load it, add the necessary code to prune the model, then fine-tune it for example for 10 epochs on the CPU. However, it's worth looking into why the fine-tuning cannot happen on the GPU.