Closed keisen closed 3 years ago
As the author of #43 and #45, I was interested in testing this using
pip install git+https://github.com/keisen/tf-keras-vis.git@refs/pull/39/merge
First thing I noticed, you are importing packaging
now, but it did not auto-install using the above command. Do you maybe need to add it as a dependency?
Then, this test code does not run. It runs fine without set_policy
:
import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization
policy = tf.keras.mixed_precision.experimental.Policy("mixed_float16")
tf.keras.mixed_precision.experimental.set_policy(policy)
model = tf.keras.applications.MobileNet()
ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
print("Done")
Output is
Exception has occurred: AttributeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_in_graph_mode'
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 1366, in _var_key
if var._in_graph_mode:
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 826, in add_slot
var_key = _var_key(var)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/rmsprop.py", line 155, in _create_slots
self.add_slot(var, "rms")
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 783, in _create_all_weights
self._create_slots(var_list)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 604, in apply_gradients
self._create_all_weights(var_list)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 787, in _apply_gradients
return self._optimizer.apply_gradients(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3417, in _call_for_each_replica
return fn(*args, **kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2730, in call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 761, in apply_fn
return distribution.extended.call_for_each_replica(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/framework/smart_cond.py", line 54, in smart_cond
return true_fn()
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 776, in _apply_gradients_cross_replica
maybe_apply_op = smart_cond.smart_cond(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 572, in wrapper
return func(*args, **kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2948, in _merge_call
return merge_fn(self._strategy, *args, **kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2941, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 739, in apply_gradients
return distribution_strategy_context.get_replica_context().merge_call(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 149, in __call__
optimizer.apply_gradients(zip(grads, seed_inputs))
File "/home/bers/cia/cia/cnn/bug.py", line 8, in <module>
ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
Then, you seem to rely a lot on the compute policy being set, but that is not actually ensured. Look at this example (run twice to see the error). You might instead rely on model.compute_dtype
.
import sys
from pathlib import Path
import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization
model_file = Path("bug.tf")
if not model_file.exists():
policy = tf.keras.mixed_precision.experimental.Policy("mixed_float16")
tf.keras.mixed_precision.experimental.set_policy(policy)
model = tf.keras.applications.MobileNet()
model.save(model_file)
sys.exit()
model = tf.keras.models.load_model(model_file)
ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
print("Done")
Error is
Exception has occurred: ValueError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Incompatible type conversion requested to type 'float32' for AutoCastVariable which is casted to type 'float16'
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/autocast_variable.py", line 132, in _dense_var_to_tensor
raise ValueError(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1540, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
return func(*args, **kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 273, in args_to_matching_eager
tensor = ops.convert_to_tensor(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1019, in conv2d_eager_fallback
_attr_T, _inputs_T = _execute.args_to_matching_eager([input, filter], ctx, [_dtypes.half, _dtypes.bfloat16, _dtypes.float32, _dtypes.float64, _dtypes.int32, ])
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 936, in conv2d
return conv2d_eager_fallback(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 2597, in _conv2d_expanded_batch
return gen_nn_ops.conv2d(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 1143, in convolution_internal
return op(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 1013, in convolution_v2
return convolution_internal(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 248, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 560, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 424, in call
return self._run_internal_graph(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 122, in __call__
outputs = self.model(seed_inputs, training=training)
File "/home/bers/cia/cia/cnn/bug.py", line 17, in <module>
ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
Finally (for today), here's one more example that runs fine without set_global_policy()
but does not with it (both for the tuple
and the list
case). The problem (as well as the first one above) may be related to this line from the TF2.4.0 release notes:
The property
tf.keras.mixed_precision.experimental.LossScaleOptimizer.loss_scale
is now a tensor, not aLossScale
object. This means to get a loss scale of aLossScaleOptimizer
as a tensor, you must now callopt.loss_scale
instead ofopt.loss_scale()
.
import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization
policy = tf.keras.mixed_precision.Policy("mixed_float16")
tf.keras.mixed_precision.set_global_policy(policy)
model = tf.keras.applications.MobileNet()
# ActivationMaximization(model)(lambda x: [x[0]], tf.zeros(model.input.shape[1:]))
ActivationMaximization(model)(lambda x: (x[0],), tf.zeros(model.input.shape[1:]))
print("Done")
The error is
Exception has occurred: AttributeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
'tuple' object has no attribute 'dtype'
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 676, in get_scaled_loss
return loss * math_ops.cast(self.loss_scale, loss.dtype)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 126, in <genexpr>
score_values = (optimizer.get_scaled_loss(score_value)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 128, in <genexpr>
score_values = (tf.stack(score_value, axis=0) if isinstance(
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 130, in <listcomp>
score_values = [
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 130, in __call__
score_values = [
File "/home/bers/cia/bug.py", line 9, in <module>
ActivationMaximization(model)(lambda x: (x[0],), tf.zeros(model.input.shape[1:]))
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
Hope this helps a bit :)
Hi, Thank you for this great PR! Within this PR it seems like trying to use the mixed_precision for training code. Does anyone tried to embed the GradCAM to the model which saved to savedModel? I'm able to embed the GradCAM by disable eager mode with float32 model, however when it come to float16 model, the GradCAM gradients calculation get all zeros which I think is underflow, do this PR have such problem?
@luvwinnie just try pip install git+https://github.com/keisen/tf-keras-vis.git@refs/pull/39/merge
and report back with a minimal example in case you hit any issues.
I'm sorry for late reply and pushing incomplete implements. @bersbersbers , Thank you for your great review. I'd be happy if you make sure that the bugs are fixed. @luvwinnie , Thank you for your report! Could you please make sure that the problem was improved.
Thanks!
As the author of #43 and #45, I was interested in testing this using
@bersbersbers , I apologize that the tests that is relative to #43 , #45 and #47 exclude in this PR, because I don't have enough time to prepare implement them.
@bersbersbers , I apologize that the tests that is relative to #43 , #45 and #47 exclude in this PR, because I don't have enough time to prepare implement them.
Sure, no problem! I am using my own test cases anyway - as long as it's working, my interest in test cases in this repository is limited ;) These can easily be added at a later time.
I'm sorry for late reply and pushing incomplete implements. @bersbersbers , Thank you for your great review. I'd be happy if you make sure that the bugs are fixed.
On it.
Alright, I did:
pip uninstall tf-keras-vis
pip install git+https://github.com/keisen/tf-keras-vis.git@refs/pull/39/merge
This seems to have installed 34c3681c40e2. Now:
packaging
issue https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-781993150 seems fixed.ActivationMaximization
in https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-782003025 and https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-782021611 seem fixed - at least the code finishes now. (I have not yet verified that the results are correct.)float16
model and saves it, while the second run loads it without setting the compute policy. (This works fine in all my applications, so I guess it's a valid approach.) However, I get
ValueError: Incompatible type conversion requested to type 'float32' for AutoCastVariable which is casted to type 'float16'
Some thoughts:
I said earlier that I think you may need to rely on self.model.compute_dtype
, but I am not sure that is correct as that is 'float32'
(also dtype
and variable_dtype
). [Edit: I'm not sure what I observed earlier - more recently, I am seeing model.compute_dtype == tf.float16
, and also some model.dtype_policy
that you might use.]
However, self.model.output.dtype
is tf.float16
.
Still, as a user, I cannot simply do
ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:], dtype=tf.float16)
This will give
Expected tensor with type tf.float32 not tf.float16
which does make sense to me: self.model.input.dtype
is tf.float32
after all.
102d101
< seed_inputs = [tf.cast(x, tf.float32) for x in seed_inputs]
107d105
< seed_inputs = (tf.cast(x, self.model.output.dtype) for x in seed_inputs)
This fix is somewhat similar to what I did in https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-763819714 - no idea if self.model.output.dtype
or self.model.layers[-2].compute_dtype
is the more relevant here.
Similar to https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-782006899, this code fails (again, run twice):
import sys
from pathlib import Path
import tensorflow as tf
from tf_keras_vis.scorecam import ScoreCAM
model_file = Path("bug.tf")
if not model_file.exists():
policy = tf.keras.mixed_precision.experimental.Policy("mixed_float16")
tf.keras.mixed_precision.experimental.set_policy(policy)
model = tf.keras.applications.MobileNet()
model.save(model_file)
sys.exit()
model = tf.keras.models.load_model(model_file)
ScoreCAM(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
print("Done.")
Problem is here:
I suspect that the fix from https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-762375980 helps, but haven't tested.
So much for now. I must admit I haven't fully understood if you plan to address the remaining issues in this PR or later. Basically, they all relate to creating a float16
model, then loading it without setting the compute policy. So they can be worked around by simply setting the compute policy, but that's rather obscure when you load a model you obtained from someone else without knowing that the policy is. Let me know if you want me to re-post these issues some place else.
Regarding https://github.com/keisen/tf-keras-vis/issues/41#issuecomment-788954750, I think https://www.tensorflow.org/guide/mixed_precision is an important read to figure out which variable one should rely on to determine what dtype
some input variable should have, and what dtype
one should expect some model output to have. I am pretty certain that the global_policy
is not the right thing to look at, as that can easily be changed after model construction and will not change the model (its influence is limited to newly created layers). Similarly, models loaded from file do not use the global_policy
. I believe you should rely only on model properties and ideally, layer properties of the exact layers with which you interact (mainly model.input
, model.output
, model.layers[0]
and model.layers[-1]
of the modified model, I would guess, taking into account the differences between compute_dtype
and variable_dtype
).
And in the spirit of #41, here's another example that fails with 34c3681:
import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization
tf.keras.mixed_precision.set_global_policy("mixed_float16")
base_model = tf.keras.applications.MobileNet(input_shape=[32, 32, 3], include_top=False)
layer = base_model.output
layer = tf.keras.layers.Flatten(name="flatten")(layer)
layer = tf.keras.layers.Dense(2, dtype=tf.float32)(layer)
model = tf.keras.models.Model(inputs=base_model.input, outputs=layer)
ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]), steps=1)
print("Done.")
I'll need to stop for today, but to summarize, I think your test cases should involve
for all visualizations.
I have tested 66132db, with little success. Many of the previous examples are still failing, see #41, #43, #45. These all use some specific network structure (base network constructed with "mixed_float16"
, with an output layer assigned to be float32
. Maybe these are all related.
This one is also still failing (run twice):
# pip install tensorflow==2.4.1 git+https://github.com/keisen/tf-keras-vis@66132db3
import sys
from pathlib import Path
import tensorflow as tf
from tf_keras_vis import scorecam
model_file = Path("bug.tf")
if not model_file.exists():
tf.keras.mixed_precision.set_global_policy("mixed_float16")
model = tf.keras.applications.MobileNet(
weights=None, input_shape=(32, 32, 3), classes=2
)
model.save(model_file)
sys.exit()
model = tf.keras.models.load_model(model_file)
data = tf.zeros(model.input.shape[1:])
loss = lambda output: sum(output)
scorecam.ScoreCAM(model)(loss, data)
print("Done.")
ValueError: Cannot do batch_dot on inputs with different batch sizes. Received inputs with shapes (1, 1, 1, 2) and (2, 2).
This one works with lambda output: output
, which I don't really understand. Shouldn't the score function return a single score?
@bersbersbers , Thank you for pointing them out. I'm so grateful for that!
I have a request of you. When there are very similar comments in several threads (Issue or RP), I may forget them or the relationship between them. So even if a point relate several issues or PR, please comment to only a main thread (in this case, it's this PR) , not all those ones.
Thanks!
This also still fails with 280868e1652bed0e0bbaee81c4e4c3ca32675478:
And in the spirit of #41, here's another example that fails with 34c3681:
import tensorflow as tf from tf_keras_vis.activation_maximization import ActivationMaximization tf.keras.mixed_precision.set_global_policy("mixed_float16") base_model = tf.keras.applications.MobileNet(input_shape=[32, 32, 3], include_top=False) layer = base_model.output layer = tf.keras.layers.Flatten(name="flatten")(layer) layer = tf.keras.layers.Dense(2, dtype=tf.float32)(layer) model = tf.keras.models.Model(inputs=base_model.input, outputs=layer) ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]), steps=1) print("Done.")
cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a half tensor [Op:AddV2]
This also still fails with 280868e:
@bersbersbers , Thank you for reporting! I could NOT find a way to avoid or fix the problem. So, for now, related testcase is skipped.
Thanks!
@bersbersbers , If you can, please submit a PR that fix this issue. Thank you for your cooperation!
Sure, very welcome. Regarding the problem in https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-831399112, I am pretty sure that I had it working either in v0.5.5 or in some earlier version of v0.6.0, probably with fixes I had proposed earlier. Have you tried https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-763819714? (Sorry these things are a bit all over the place, but I was not aware of this PR back then.)
Have you tried #45 (comment)?
Although #45 seems the other problem, can it solve this issue?
If so, I'm looking forward to see [smarter ways to infer the proper dtype and maybe better places to cast
] you said!
(Considering maintenance, The way in https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-763819714 is not very intuitive. I'm glad that there is better way to do so.)
Thanks!
Have you tried #45 (comment)?
Although #45 seems the other problem, can it solve this issue?
Yes, I just tried on top of 280868e. The reason that this is in #45 is because I got tired posting the same network example in various issues (#41, #43, #45, etc.) - each with one different visualization (GradCam, GradCam++, ScoreCam). So I justed posted a solution for the same network with Activation Maximization there.
If so, I'm looking forward to see [
smarter ways to infer the proper dtype and maybe better places to cast
] you said!
Well, I said I am sure there way, but I was not sure I know one. But here we go now.
(Considering maintenance, The way in #45 (comment) is not very intuitive. I'm glad that there is better way to do so.)
First, the first two changes from https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-763819714 are now obsolete, so I focused on the last.
What do you think about this one - is this one more maintainable in your opinion? It addresses the key point that score_values
and regularizations
cannot be added due to different dtype
s, using the is_compatible_with
check designed for this purpose.
Of course, you can always cast the other way around (cast regularizations
to score_dtype
, or cast to float32
whatever is not float32
already) - I have no strong opinion on this and just chose the easiest way (score_values
can be cast in a single operation). One might also make use of tf.experimental.numpy.result_type
in the future, but that is still experimental.
diff --git a/tf_keras_vis/activation_maximization/__init__.py b/tf_keras_vis/activation_maximization/__init__.py
index b325d2c..cf5f456 100644
--- a/tf_keras_vis/activation_maximization/__init__.py
+++ b/tf_keras_vis/activation_maximization/__init__.py
@@ -139,6 +139,12 @@ class ActivationMaximization(ModelVisualization):
# Calculate regularization values
regularizations = [(regularizer.name, regularizer(seed_inputs))
for regularizer in regularizers]
+
+ score_dtype = score_values[0].dtype
+ regularization_dtype = regularizations[0][1].dtype
+ if not score_dtype.is_compatible_with(regularization_dtype):
+ score_values = tf.cast(score_values, regularization_dtype)
+
regularized_score_values = [
(-1. * score_value) + sum([v for _, v in regularizations])
for score_value in score_values
lines 1-17/17 (END)
By the way, in 9b3d509a5c45d3c354ed650378939589841be41b you pinned scipy==1.4.*
- is that necessary? Because I am now getting
plotnine 0.8.0 requires scipy>=1.5.0, but you have scipy 1.4.1 which is incompatible.
Edit: Also, pillow==7.1.*
is over a year old.
Both of these old packages are not available (prebuilt) for Python 3.9, so installing 280868e on Python 3.9 fails for me due to missing "lapack/blas resources" (when compiling scipy
). See https://github.com/scipy/scipy/issues/9005#issuecomment-623528512, but I do not have admin rights on my system to install the missing libraries. Note also that TF 2.5.0rc1 will support and is prebuilt for Python 3.9.
Here's an example that fails in 0.6.0 (280868e) while it works in 0.5.5:
import tensorflow as tf
from tf_keras_vis.gradcam import Gradcam
Gradcam(model := tf.keras.applications.MobileNet())(
lambda output: [o[0] for o in output],
tf.zeros(model.input.shape[1:]),
)
# AttributeError: 'list' object has no attribute 'shape'
Edit: I have noticed that output
is one Tensor now, so one can use output[:, 0]
. For the sake of compatibility, however, I would say lists should be supported.
@bersbersbers , Thank you for your code-snippet. On the second thought, I will put this problem on hold for now. The design of ActivationMaximization related to regularizes has a known issue (that multiple I/O model is NOT considered fully), for now, I can't think of the way to fix problem concisely.
Thanks!
On the second thought, I will put this problem on hold for now. The design of ActivationMaximization related to regularizes has a known issue (that multiple I/O model is NOT considered fully), for now, I can't think of the way to fix problem concisely.
You're the boss, but do know that I disagree:
Unless I am mistaken, the mixed-precision issue has nothing to do the multiple-I/O model. I find ignoring the fixed-precision issue while still allowing full-precision models to use a method that you know has difficulties somewhat questionable. Similarly, why not fix the mixed-precision issue and thereby allow mixed-precision models with single I/O to compute activation maximizations at least?
I also don't see the problem you have with the code snipped I posted. The mixed-precision issue is that two numbers (derived from different stages of the network, I guess) cannot be added without casting, so we cast them. Is there anything wrong with this approach?
You're the boss, but do know that I disagree:
@bersbersbers , Here is a open source project, so we only have fun and contribute it as possible as we can.
As I said before, unfortunately, I don't have a time enough to do all. So I decided , at least, v0.6.0 won't support that. I would do it in v0.7.0 or higher If I could find the time.
Or
@bersbersbers , If you can, please submit a PR that fix this issue.
As I said begore, because here is a open source, you can open and submit a PullRequest. I may include it to v0.6.0 if the PR merged soon.
Either way, I want to release v0.6.0 soon. Additionally, I want to keep tf-keras-vis's code clearly and concisely as possible as we can even if it were hard work to support mixed-precision.
Thanks!
@bersbersbers , Here is a open source project, so we only have fun and contribute it as possible as we can.
Sure - I did not want to criticize anyone personally, just add my perspective on the issue.
@bersbersbers , If you can, please submit a PR that fix this issue.
I can easily submit https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-831682563 as a PR, if that is your intention. I have tested it locally and it solves the issue.
If you are looking for something else, please let me know what you are looking for. To my first solution, you replied it wasn't "intuitive"; to my second solution, you reacted with a "confused" emoji; and that was all I got as a reply. Really, I am happy to contribute, but after declining two of my solutions you need to give me some additional criteria regarding what you think defines an acceptable solution.
@keisen would you mind explaining what you think the bug in TensorFlow is?
I'd like to help isolate and report it upstream, but I don't see where TensorFlow is misbehaving.
Edit: as I see it, the problem is when you add score_values
and regularizations
, which are of different dtype
. score_values
is float32
, while regularizations
is float16
. So what is the bug in your opinion?
score_values
is float32
? That is due to the model having float32
outputs, by definition of the model, right?regularizations
is float16
? This is due to seed_inputs
being float32
(which tf-keras-vis
does itself).float16
to float32
? I believe this is expected.In summary, I don't see where you think the TF bug is.
Here's another basic idea to fix this issue.
diff --git a/tf_keras_vis/activation_maximization/__init__.py b/tf_keras_vis/activation_maximization/__init__.py
index 134c52d..d8c31c5 100644
--- a/tf_keras_vis/activation_maximization/__init__.py
+++ b/tf_keras_vis/activation_maximization/__init__.py
@@ -118,6 +118,7 @@ class ActivationMaximization(ModelVisualization):
for modifier in input_modifiers[name]:
seed_inputs[j] = modifier(seed_inputs[j])
+ regularizer_seed_inputs = seed_inputs
if mixed_precision_enabled:
seed_inputs = (tf.cast(X, dtype=lower_precision_dtype(self.model))
for X in seed_inputs)
@@ -130,7 +131,7 @@ class ActivationMaximization(ModelVisualization):
outputs = listify(outputs)
score_values = self._calculate_scores(outputs, scores)
# Calculate regularization values
- regularizations = [(regularizer.name, regularizer(seed_inputs))
+ regularizations = [(regularizer.name, regularizer(regularizer_seed_inputs))
for regularizer in regularizers]
regularized_score_values = [
(-1. * score_value) + sum([v for _, v in regularizations])
Why does this work? You save the original (float32
) input for use with the regularizer, so it's not changed when you cast it to float16
later in case of mixed_precision
. That gives you all the expected dtype
s everywhere.
If you think it's more maintainable, you can also introduce lower_precision_seed_inputs
and leave the original at float32.
@bersbersbers I'm sorry for the late reply!
If you are looking for something else, please let me know what you are looking for.
I believe that even if we fixed the error, some problems are still remain. Regularization values may be NaN
. Many users may be confused because the results are different for ActivationMaximization with float32-precision and one with mixed-precision.
The reason of my reactions (emoji) are just because those ways can't fix the problem above. I can't decide whether to support mixed-precsion in ActivationMaximization is good or not. So I have no strong motivation to fully support mixed-precision
in ActivationMaximization for now.
Thanks your contributions!
I believe that even if we fixed the error, some problems are still remain.
That is true always and everywhere.
Regularization values may be
NaN
.
Is that specific to mixed_precision
? That's as good an argument as the one with multi-I/O.
Many users may be confused because the results are different for ActivationMaximization with float32-precision and one with mixed-precision.
Well, for one, that is expected - these are different networks. Who would expect different networks to produce the same result?
Second, are users less confused when tf-keras-vis
works with float32 and not with mixed-precision? Or when the maintainer references some TensorFlow bug that, frankly, I don't think exists?
Anyway, I will apply the changes mentioned above locally and be happy with them. Thanks and good luck!
Here's another basic idea to fix this issue.
It looks good. Is there any impact on calculation result by the patch?
Here's another basic idea to fix this issue.
It looks good. Is there any impact on calculation result by the patch?
I cannot say really: I don't have any comparison as the code without this patch does not run for my saved mixed-precision models, so this is the only result that I have. And my models take so long to train that I cannot re-train them with full precision now.
I can say that the results I get are somewhat expected, but I was hoping that your testing pipeline could shed more light on the immediate comparison between full/mixed precision
mixed-precision
of Tensorflow 2.4+, (i.e., NOT support experimental API).Loss
toScore
Closes #24, #43 , #45, #47 and #51