microsoft / tensorflow-directml

Fork of TensorFlow accelerated by DirectML
Apache License 2.0
454 stars 32 forks source link

Not able to use my own callbacks #376

Closed funky-soul closed 2 years ago

funky-soul commented 2 years ago

Hi, it's my first time reporting a issue, so I'm sorry if I misclassified it. I am needing to do some research with TF2.0 with my team. When I run the code in a enviroment with tensorflow-cpu, the program works just fine, as expected. However, when trying in another enviroment with tensorflow-directml -to use my GPU-, the code breaks as follow:

Captura de tela 2022-06-26 151146

Emphasis on error:

'''python File "C:\Users\berna\anaconda3\envs\tf2-directml\lib\site-packages\tensorflow_core\python\ops\gen_resource_variable_ops.py", line 64, in assign_add_variable_op _six.raise_from(_core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.NotFoundError: No registered 'AssignAddVariableOp' OpKernel for 'DML' devices
compatible with node {{node AssignAddVariableOp}} (OpKernel was found, but attributes didn't match) Requested Attributes: dtype=DT_DOUBLE . Registered: device='CPU'; dtype in [DT_INT64] device='CPU'; dtype in [DT_INT32] device='CPU'; dtype in [DT_UINT16] device='CPU'; dtype in [DT_INT16] device='CPU'; dtype in [DT_UINT8] device='CPU'; dtype in [DT_INT8] device='CPU'; dtype in [DT_HALF] device='CPU'; dtype in [DT_BFLOAT16] device='CPU'; dtype in [DT_FLOAT] device='CPU'; dtype in [DT_DOUBLE] device='CPU'; dtype in [DT_COMPLEX64] device='CPU'; dtype in [DT_COMPLEX128] device='DML'; dtype in [DT_FLOAT] device='DML'; dtype in [DT_HALF] device='DML'; dtype in [DT_INT64] [Op:AssignAddVariableOp] '''

The keras allows to create a callback as explained on https://keras.io/guides/writing_your_own_callbacks/ . I know that the problem is only with the customized callback because if I comment it and use just keras callbacks, the code return to work with directml.

Where callbacks are called [custom callback is called as "PlotLearning(X_val,y_val)"]: Captura de tela 2022-06-26 153400

Eager is activated (I'm not sure if it matters) with tf.compat.v1.enable_eager_execution()

My specifications:

Thanks for your help!

PatriceVignola commented 2 years ago

Unfortunately, tensorflow-directml doesn't support DT_DOUBLE. If double precision is not a hard requirement for your training scenario, you could use DT_FLOAT to work around it.

funky-soul commented 2 years ago

Thanks for the tip! Well, double precision is not necessary. After having a close looking over the traceback, I noticed that the function that uses float64 comes from tensorflow.keras.metrics.MeanIoU.update_state().

in my team callback:

  IoU=MeanIoU(num_classes=2,dtype='float32')
  IoU.update_state(self.Y_val,self.Y_predict_val)  # This line uses DT_DOUBLE
  med_Iou=IoU.result().numpy()

I tryied to turn it into float32, without success. So I opened the file "C:\Users\USER\anaconda3\envs\tf2-directml\Lib\site-packages\tensorflow_core\python\keras\metrics.py" and saw 'dtype=dtypes.float64'.

Captura de tela 2022-06-27 120330

I changed it to 32, however new error appeared as follow on print:

Captura de tela 2022-06-27 121347

I will be glade if the is a way to use update_state() with directml

funky-soul commented 2 years ago

SOLVED: You will need to edit 'site-packages\tensorflow_core\python\keras\metrics.py' . As said by PatriceVignola early, directml doesn't support DT_DOUBLE, so you will need to change dtype arguments of self.total_cm and current_cm as follow (from dtype.float64 to dtype.float32):

Captura de tela 2022-07-01 104604 Captura de tela 2022-07-01 104532

After this my code runned flawless.