tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 319 forks source link

Determinism is not yet supported in GPU implementation of FakeQuantWithMinMaxVarsGradient #1087

Closed puelon closed 1 year ago

puelon commented 1 year ago

I'm trying to run Quantization Aware Training (QAT) on TensorFlow with GPU support on my local machine, but I keep running into the following error:

UNIMPLEMENTED: Determinism is not yet supported in GPU implementation of FakeQuantWithMinMaxVarsGradient.

I am currently trying to run it on a RTX 3090 but it is not working. I tried running it on Google Colab as a test and no problems there on the platform. However, I would rather be able to run it on my local machine because I have more RAM memory available. I am unsure what is causing this error, the package versions I am currently using is as follows:

TensorFlow version: 2.10.0 (with GPU support) CUDA version: 64_112 cuDNN version: 64_8

Model layers:


_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 288, 384, 3)]     0         

 block1_conv1 (Conv2D)       (None, 288, 384, 64)      1792      

 block1_conv2 (Conv2D)       (None, 288, 384, 64)      36928     

 block1_pool (MaxPooling2D)  (None, 144, 192, 64)      0         

 block2_conv1 (Conv2D)       (None, 144, 192, 128)     73856     

 block2_conv2 (Conv2D)       (None, 144, 192, 128)     147584    

 block2_pool (MaxPooling2D)  (None, 72, 96, 128)       0         

 block3_conv1 (Conv2D)       (None, 72, 96, 256)       295168    

 block3_conv2 (Conv2D)       (None, 72, 96, 256)       590080    

 block3_conv3 (Conv2D)       (None, 72, 96, 256)       590080    

 block3_pool (MaxPooling2D)  (None, 36, 48, 256)       0         

 block4_conv1 (Conv2D)       (None, 36, 48, 512)       1180160   

 block4_conv2 (Conv2D)       (None, 36, 48, 512)       2359808   

 block4_conv3 (Conv2D)       (None, 36, 48, 512)       2359808   

 block4_pool (MaxPooling2D)  (None, 18, 24, 512)       0         

 block5_conv1 (Conv2D)       (None, 18, 24, 512)       2359808   

 block5_conv2 (Conv2D)       (None, 18, 24, 512)       2359808   

 block5_conv3 (Conv2D)       (None, 18, 24, 512)       2359808   

 flatten_2 (Flatten)         (None, 221184)            0         

 dense (Dense)               (None, 16)                3538960   

 output (Dense)              (None, 2)                 34       

QAT code used:

quantize_model = tfmot.quantization.keras.quantize_model
q_aware_model = quantize_model(model)
q_aware_model.compile(loss=keras.losses.CategoricalCrossentropy(),
                              optimizer=optimizers.Adam(learning_rate=1e-4, amsgrad=True),
                              metrics=['accuracy'])

q_aware_model.fit(x_train, y_train, validation_split=0.25, epochs=3, batch_size=104)
q_aware_model.evaluate(x_test, y_test, verbose=2)

See the error logs it triggers below:

Cell In[4], line 127, in model_reassembling() 122 q_aware_model = quantize_model(model) 123 q_aware_model.compile(loss=keras.losses.CategoricalCrossentropy(), 124 optimizer=optimizers.Adam(learning_rate=1e-4, amsgrad=True), 125 metrics=['accuracy']) --> 127 q_aware_model.fit(x_train, y_train, validation_split=0.25, epochs=3, batch_size=104) 128 q_aware_model.evaluate(x_test, y_test, verbose=2)

File ~\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs) 67 filtered_tb = _process_traceback_frames(e.traceback) 68 # To get the full stack trace, call: 69 # tf.debugging.disable_traceback_filtering() ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb

File ~\anaconda3\envs\tensorflowgpu\lib\site-packages\tensorflow\python\eager\execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 52 try: 53 ctx.ensure_initialized() ---> 54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 55 inputs, attrs, num_outputs) 56 except core._NotOkStatusException as e: 57 if name is not None:

UnimplementedError: Graph execution error:

Detected at node 'gradient_tape/model_1/quant_output/MovingAvgQuantize/FakeQuantWithMinMaxVarsGradient' defined at (most recent call last): File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\ipykernel_launcher.py", line 17, in app.launch_new_instance() File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\traitlets\config\application.py", line 992, in launch_instance app.start() File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\ipykernel\kernelapp.py", line 736, in start self.io_loop.start() File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\tornado\platform\asyncio.py", line 195, in start self.asyncio_loop.run_forever() File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\asyncio\base_events.py", line 595, in run_forever self._run_once() File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\asyncio\base_events.py", line 1881, in _run_once handle._run() File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, self._args) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\ipykernel\kernelbase.py", line 516, in dispatch_queue await self.process_one() File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\ipykernel\kernelbase.py", line 505, in process_one await dispatch(args) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\ipykernel\kernelbase.py", line 412, in dispatch_shell await result File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\ipykernel\kernelbase.py", line 740, in execute_request reply_content = await reply_content File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\ipykernel\ipkernel.py", line 422, in do_execute res = shell.run_cell( File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\ipykernel\zmqshell.py", line 546, in run_cell return super().run_cell(*args, *kwargs) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\IPython\core\interactiveshell.py", line 3006, in run_cell result = self._run_cell( File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\IPython\core\interactiveshell.py", line 3061, in _run_cell result = runner(coro) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner coro.send(None) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\IPython\core\interactiveshell.py", line 3266, in run_cell_async has_raised = await self.run_ast_nodes(code_ast.body, cell_name, File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\IPython\core\interactiveshell.py", line 3445, in run_ast_nodes if await self.runcode(code, result, async=asy): File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "C:\Users\Publi\AppData\Local\Temp\ipykernel_23944\3632630137.py", line 130, in model_reassembling() File "C:\Users\Publi\AppData\Local\Temp\ipykernel_23944\3632630137.py", line 127, in model_reassembling q_aware_model.fit(x_train, y_train, validation_split=0.25, epochs=3, batch_size=104) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler return fn(args, **kwargs) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\engine\training.py", line 1564, in fit tmp_logs = self.train_function(iterator) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\engine\training.py", line 1160, in train_function return step_function(self, iterator) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\engine\training.py", line 1146, in step_function outputs = model.distribute_strategy.run(run_step, args=(data,)) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\engine\training.py", line 1135, in run_step outputs = model.train_step(data) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\engine\training.py", line 997, in train_step self.optimizer.minimize(loss, self.trainable_variables, tape=tape) File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 576, in minimize grads_and_vars = self._compute_gradients( File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 634, in _compute_gradients grads_and_vars = self._get_gradients( File "C:\Users\Publi\anaconda3\envs\tensorflowgpu\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 510, in _get_gradients grads = tape.gradient(loss, var_list, grad_loss) Node: 'gradient_tape/model_1/quant_output/MovingAvgQuantize/FakeQuantWithMinMaxVarsGradient' 2 root error(s) found. (0) UNIMPLEMENTED: Determinism is not yet supported in GPU implementation of FakeQuantWithMinMaxVarsGradient. [[{{node gradient_tape/model_1/quant_output/MovingAvgQuantize/FakeQuantWithMinMaxVarsGradient}}]] (1) CANCELLED: Function was cancelled before it was started 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_36548]

Xhark commented 1 year ago

Did you enabled Determinisim feature somewhere in your code?

https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_op_determinism

Sinse we don't support determinism for QAT at this moment, you may have not to use this feature. Thanks!

puelon commented 1 year ago

Hi Xhark, thank you very much for replying, I was able to run it by turning off determinism.

steveepreston commented 1 month ago

@Xhark So, Is there no way to achieve reproducible results?