Release v0.6.0 - Githubissues

keisen commented 3 years ago

[x] Support mixed-precision of Tensorflow 2.4+, (i.e., NOT support experimental API).
[x] Change a terminology that Loss to Score
[x] Refactoring Testcases, ActivationMaximization and some utilities
[x] Some bugfixes
[x] Write docstrings
[x] Update example notebook
[x] README.md
[x] Add tests
[x] Update setup.py
[x] Write Release note

Closes #24, #43 , #45, #47 and #51

bersbersbers commented 3 years ago

As the author of #43 and #45, I was interested in testing this using pip install git+https://github.com/keisen/tf-keras-vis.git@refs/pull/39/merge First thing I noticed, you are importing packaging now, but it did not auto-install using the above command. Do you maybe need to add it as a dependency?

bersbersbers commented 3 years ago

Then, this test code does not run. It runs fine without set_policy:

import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization

policy = tf.keras.mixed_precision.experimental.Policy("mixed_float16")
tf.keras.mixed_precision.experimental.set_policy(policy)
model = tf.keras.applications.MobileNet()

ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
print("Done")

Output is

Exception has occurred: AttributeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_in_graph_mode'
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 1366, in _var_key
    if var._in_graph_mode:
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 826, in add_slot
    var_key = _var_key(var)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/rmsprop.py", line 155, in _create_slots
    self.add_slot(var, "rms")
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 783, in _create_all_weights
    self._create_slots(var_list)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 604, in apply_gradients
    self._create_all_weights(var_list)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 787, in _apply_gradients
    return self._optimizer.apply_gradients(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3417, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2730, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 761, in apply_fn
    return distribution.extended.call_for_each_replica(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/framework/smart_cond.py", line 54, in smart_cond
    return true_fn()
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 776, in _apply_gradients_cross_replica
    maybe_apply_op = smart_cond.smart_cond(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 572, in wrapper
    return func(*args, **kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2948, in _merge_call
    return merge_fn(self._strategy, *args, **kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2941, in merge_call
    return self._merge_call(merge_fn, args, kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 739, in apply_gradients
    return distribution_strategy_context.get_replica_context().merge_call(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 149, in __call__
    optimizer.apply_gradients(zip(grads, seed_inputs))
  File "/home/bers/cia/cia/cnn/bug.py", line 8, in <module>
    ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

bersbersbers commented 3 years ago

Then, you seem to rely a lot on the compute policy being set, but that is not actually ensured. Look at this example (run twice to see the error). You might instead rely on model.compute_dtype.

import sys
from pathlib import Path

import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization

model_file = Path("bug.tf")

if not model_file.exists():
    policy = tf.keras.mixed_precision.experimental.Policy("mixed_float16")
    tf.keras.mixed_precision.experimental.set_policy(policy)
    model = tf.keras.applications.MobileNet()
    model.save(model_file)
    sys.exit()

model = tf.keras.models.load_model(model_file)
ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
print("Done")

Error is

Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Incompatible type conversion requested to type 'float32' for AutoCastVariable which is casted to type 'float16'
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/autocast_variable.py", line 132, in _dense_var_to_tensor
    raise ValueError(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1540, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 273, in args_to_matching_eager
    tensor = ops.convert_to_tensor(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1019, in conv2d_eager_fallback
    _attr_T, _inputs_T = _execute.args_to_matching_eager([input, filter], ctx, [_dtypes.half, _dtypes.bfloat16, _dtypes.float32, _dtypes.float64, _dtypes.int32, ])
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 936, in conv2d
    return conv2d_eager_fallback(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 2597, in _conv2d_expanded_batch
    return gen_nn_ops.conv2d(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 1143, in convolution_internal
    return op(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 1013, in convolution_v2
    return convolution_internal(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 248, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 560, in _run_internal_graph
    outputs = node.layer(*args, **kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 424, in call
    return self._run_internal_graph(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 122, in __call__
    outputs = self.model(seed_inputs, training=training)
  File "/home/bers/cia/cia/cnn/bug.py", line 17, in <module>
    ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

bersbersbers commented 3 years ago

Finally (for today), here's one more example that runs fine without set_global_policy() but does not with it (both for the tuple and the list case). The problem (as well as the first one above) may be related to this line from the TF2.4.0 release notes:

The property tf.keras.mixed_precision.experimental.LossScaleOptimizer.loss_scale is now a tensor, not a LossScale object. This means to get a loss scale of a LossScaleOptimizer as a tensor, you must now call opt.loss_scale instead of opt.loss_scale().

import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization

policy = tf.keras.mixed_precision.Policy("mixed_float16")
tf.keras.mixed_precision.set_global_policy(policy)
model = tf.keras.applications.MobileNet()

# ActivationMaximization(model)(lambda x: [x[0]], tf.zeros(model.input.shape[1:]))
ActivationMaximization(model)(lambda x: (x[0],), tf.zeros(model.input.shape[1:]))
print("Done")

The error is

Exception has occurred: AttributeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
'tuple' object has no attribute 'dtype'
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/loss_scale_optimizer.py", line 676, in get_scaled_loss
    return loss * math_ops.cast(self.loss_scale, loss.dtype)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 126, in <genexpr>
    score_values = (optimizer.get_scaled_loss(score_value)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 128, in <genexpr>
    score_values = (tf.stack(score_value, axis=0) if isinstance(
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 130, in <listcomp>
    score_values = [
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 130, in __call__
    score_values = [
  File "/home/bers/cia/bug.py", line 9, in <module>
    ActivationMaximization(model)(lambda x: (x[0],), tf.zeros(model.input.shape[1:]))
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data2/bers/opt/pyenv/versions/3.8.7/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

bersbersbers commented 3 years ago

Hope this helps a bit :)

luvwinnie commented 3 years ago

Hi, Thank you for this great PR! Within this PR it seems like trying to use the mixed_precision for training code. Does anyone tried to embed the GradCAM to the model which saved to savedModel? I'm able to embed the GradCAM by disable eager mode with float32 model, however when it come to float16 model, the GradCAM gradients calculation get all zeros which I think is underflow, do this PR have such problem?

bersbersbers commented 3 years ago

@luvwinnie just try pip install git+https://github.com/keisen/tf-keras-vis.git@refs/pull/39/merge and report back with a minimal example in case you hit any issues.

keisen commented 3 years ago

I'm sorry for late reply and pushing incomplete implements. @bersbersbers , Thank you for your great review. I'd be happy if you make sure that the bugs are fixed. @luvwinnie , Thank you for your report! Could you please make sure that the problem was improved.

Thanks!

keisen commented 3 years ago

As the author of #43 and #45, I was interested in testing this using

@bersbersbers , I apologize that the tests that is relative to #43 , #45 and #47 exclude in this PR, because I don't have enough time to prepare implement them.

bersbersbers commented 3 years ago

@bersbersbers , I apologize that the tests that is relative to #43 , #45 and #47 exclude in this PR, because I don't have enough time to prepare implement them.

Sure, no problem! I am using my own test cases anyway - as long as it's working, my interest in test cases in this repository is limited ;) These can easily be added at a later time.

I'm sorry for late reply and pushing incomplete implements. @bersbersbers , Thank you for your great review. I'd be happy if you make sure that the bugs are fixed.

On it.

bersbersbers commented 3 years ago

Alright, I did:

pip uninstall tf-keras-vis
pip install git+https://github.com/keisen/tf-keras-vis.git@refs/pull/39/merge

This seems to have installed 34c3681c40e2. Now:

The packaging issue https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-781993150 seems fixed.
Also, the issues around ActivationMaximization in https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-782003025 and https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-782021611 seem fixed - at least the code finishes now. (I have not yet verified that the results are correct.)
Finally, the code in https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-782006899 still throws an error. Note that you may need to run this code twice - the first run only creates a float16 model and saves it, while the second run loads it without setting the compute policy. (This works fine in all my applications, so I guess it's a valid approach.) However, I get

ValueError: Incompatible type conversion requested to type 'float32' for AutoCastVariable which is casted to type 'float16'

``` Traceback (most recent call last): File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data2/bers/opt/vscode-server/extensions/ms-python.python-2021.2.582707922/pythonFiles/lib/python/debugpy/__main__.py", line 45, in cli.main() File "/data2/bers/opt/vscode-server/extensions/ms-python.python-2021.2.582707922/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main run() File "/data2/bers/opt/vscode-server/extensions/ms-python.python-2021.2.582707922/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file runpy.run_path(target_as_str, run_name=compat.force_str("__main__")) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/bers/cia/cia/bug.py", line 17, in ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:])) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 114, in __call__ outputs = self.model(seed_inputs, training=training) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 424, in call return self._run_internal_graph( File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 560, in _run_internal_graph outputs = node.layer(*args, **kwargs) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 248, in call outputs = self._convolution_op(inputs, self.kernel) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 1013, in convolution_v2 return convolution_internal( File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 1143, in convolution_internal return op( File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 2597, in _conv2d_expanded_batch return gen_nn_ops.conv2d( File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 936, in conv2d return conv2d_eager_fallback( File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1019, in conv2d_eager_fallback _attr_T, _inputs_T = _execute.args_to_matching_eager([input, filter], ctx, [_dtypes.half, _dtypes.bfloat16, _dtypes.float32, _dtypes.float64, _dtypes.int32, ]) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 273, in args_to_matching_eager tensor = ops.convert_to_tensor( File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped return func(*args, **kwargs) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1540, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/autocast_variable.py", line 132, in _dense_var_to_tensor raise ValueError( ValueError: Incompatible type conversion requested to type 'float32' for AutoCastVariable which is casted to type 'float16' ```

Some thoughts:

The problem is here:

https://github.com/keisen/tf-keras-vis/blob/34c3681c40e23630761eb47a7c7ca712e905778a/tf_keras_vis/activation_maximization/__init__.py#L114

I said earlier that I think you may need to rely on self.model.compute_dtype, but I am not sure that is correct as that is 'float32' (also dtype and variable_dtype). [Edit: I'm not sure what I observed earlier - more recently, I am seeing model.compute_dtype == tf.float16, and also some model.dtype_policy that you might use.]
However, self.model.output.dtype is tf.float16.
Still, as a user, I cannot simply do
```
ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:], dtype=tf.float16)
```
This will give

Expected tensor with type tf.float32 not tf.float16

which does make sense to me: self.model.input.dtype is tf.float32 after all.

I could fix this again like this:

102d101
<             seed_inputs = [tf.cast(x, tf.float32) for x in seed_inputs]
107d105
<             seed_inputs = (tf.cast(x, self.model.output.dtype) for x in seed_inputs)

This fix is somewhat similar to what I did in https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-763819714 - no idea if self.model.output.dtype or self.model.layers[-2].compute_dtype is the more relevant here.

bersbersbers commented 3 years ago

Similar to https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-782006899, this code fails (again, run twice):

import sys
from pathlib import Path

import tensorflow as tf
from tf_keras_vis.scorecam import ScoreCAM

model_file = Path("bug.tf")

if not model_file.exists():
    policy = tf.keras.mixed_precision.experimental.Policy("mixed_float16")
    tf.keras.mixed_precision.experimental.set_policy(policy)
    model = tf.keras.applications.MobileNet()
    model.save(model_file)
    sys.exit()

model = tf.keras.models.load_model(model_file)
ScoreCAM(model)(lambda x: x, tf.zeros(model.input.shape[1:]))
print("Done.")

``` array type dtype('float16') not supported File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/scipy/ndimage/interpolation.py", line 104, in spline_filter1d _nd_image.spline_filter1d(input, order, axis, output, mode) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/scipy/ndimage/interpolation.py", line 135, in spline_filter spline_filter1d(input, order, axis, output=output, mode=mode) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/scipy/ndimage/interpolation.py", line 598, in zoom filtered = spline_filter(input, order, output=numpy.float64) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tf_keras_vis/scorecam.py", line 94, in upsampled_activation_maps = [zoom(penultimate_output, factor + (1, )) for factor in factors] File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tf_keras_vis/scorecam.py", line 94, in __call__ upsampled_activation_maps = [zoom(penultimate_output, factor + (1, )) for factor in factors] File "/home/bers/cia/cia/bug.py", line 17, in ScoreCAM(model)(lambda x: x, tf.zeros(model.input.shape[1:])) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame) return _run_code(code, main_globals, None, ```

Problem is here:

https://github.com/keisen/tf-keras-vis/blob/34c3681c40e23630761eb47a7c7ca712e905778a/tf_keras_vis/scorecam.py#L94

I suspect that the fix from https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-762375980 helps, but haven't tested.

bersbersbers commented 3 years ago

So much for now. I must admit I haven't fully understood if you plan to address the remaining issues in this PR or later. Basically, they all relate to creating a float16 model, then loading it without setting the compute policy. So they can be worked around by simply setting the compute policy, but that's rather obscure when you load a model you obtained from someone else without knowing that the policy is. Let me know if you want me to re-post these issues some place else.

bersbersbers commented 3 years ago

Regarding https://github.com/keisen/tf-keras-vis/issues/41#issuecomment-788954750, I think https://www.tensorflow.org/guide/mixed_precision is an important read to figure out which variable one should rely on to determine what dtype some input variable should have, and what dtype one should expect some model output to have. I am pretty certain that the global_policy is not the right thing to look at, as that can easily be changed after model construction and will not change the model (its influence is limited to newly created layers). Similarly, models loaded from file do not use the global_policy. I believe you should rely only on model properties and ideally, layer properties of the exact layers with which you interact (mainly model.input, model.output, model.layers[0] and model.layers[-1] of the modified model, I would guess, taking into account the differences between compute_dtype and variable_dtype).

bersbersbers commented 3 years ago

And in the spirit of #41, here's another example that fails with 34c3681:

import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization

tf.keras.mixed_precision.set_global_policy("mixed_float16")

base_model = tf.keras.applications.MobileNet(input_shape=[32, 32, 3], include_top=False)
layer = base_model.output
layer = tf.keras.layers.Flatten(name="flatten")(layer)
layer = tf.keras.layers.Dense(2, dtype=tf.float32)(layer)
model = tf.keras.models.Model(inputs=base_model.input, outputs=layer)

ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]), steps=1)
print("Done.")

``` Exception has occurred: InvalidArgumentError (note: full exception trace is shown but execution is paused at: _run_module_as_main) cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a half tensor [Op:AddV2] File "", line 3, in raise_from File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6862, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 472, in add_v2 _ops.raise_from_not_ok_status(e, name) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1486, in _add_dispatch return gen_math_ops.add_v2(x, y, name=name) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1164, in binary_op_wrapper return func(x, y, name=name) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 127, in (-1. * score_value) + sum([v for _, v in regularizations]) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/site-packages/tf_keras_vis/activation_maximization/__init__.py", line 126, in __call__ regularized_score_values = [ File "/home/bers/cia/cia/cnn/bug3.py", line 12, in ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]), steps=1) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data2/bers/opt/pyenv/versions/3.8.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main (Current frame) return _run_code(code, main_globals, None, ```

bersbersbers commented 3 years ago

I'll need to stop for today, but to summarize, I think your test cases should involve

different policies when creating a model (which mostly works)
loading a stored model without specifying the policy, see https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-788794176 and https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-788781182
manually-set output-layer dtypes, https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-788982257

for all visualizations.

bersbersbers commented 3 years ago

I have tested 66132db, with little success. Many of the previous examples are still failing, see #41, #43, #45. These all use some specific network structure (base network constructed with "mixed_float16", with an output layer assigned to be float32. Maybe these are all related.

This one is also still failing (run twice):

# pip install tensorflow==2.4.1 git+https://github.com/keisen/tf-keras-vis@66132db3
import sys
from pathlib import Path

import tensorflow as tf
from tf_keras_vis import scorecam

model_file = Path("bug.tf")

if not model_file.exists():
    tf.keras.mixed_precision.set_global_policy("mixed_float16")
    model = tf.keras.applications.MobileNet(
        weights=None, input_shape=(32, 32, 3), classes=2
    )
    model.save(model_file)
    sys.exit()

model = tf.keras.models.load_model(model_file)
data = tf.zeros(model.input.shape[1:])
loss = lambda output: sum(output)
scorecam.ScoreCAM(model)(loss, data)
print("Done.")

ValueError: Cannot do batch_dot on inputs with different batch sizes. Received inputs with shapes (1, 1, 1, 2) and (2, 2).

This one works with lambda output: output, which I don't really understand. Shouldn't the score function return a single score?

keisen commented 3 years ago

@bersbersbers , Thank you for pointing them out. I'm so grateful for that!

I have a request of you. When there are very similar comments in several threads (Issue or RP), I may forget them or the relationship between them. So even if a point relate several issues or PR, please comment to only a main thread (in this case, it's this PR) , not all those ones.

Thanks!

bersbersbers commented 3 years ago

This also still fails with 280868e1652bed0e0bbaee81c4e4c3ca32675478:

And in the spirit of #41, here's another example that fails with 34c3681:

import tensorflow as tf
from tf_keras_vis.activation_maximization import ActivationMaximization

tf.keras.mixed_precision.set_global_policy("mixed_float16")

base_model = tf.keras.applications.MobileNet(input_shape=[32, 32, 3], include_top=False)
layer = base_model.output
layer = tf.keras.layers.Flatten(name="flatten")(layer)
layer = tf.keras.layers.Dense(2, dtype=tf.float32)(layer)
model = tf.keras.models.Model(inputs=base_model.input, outputs=layer)

ActivationMaximization(model)(lambda x: x, tf.zeros(model.input.shape[1:]), steps=1)
print("Done.")

cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a half tensor [Op:AddV2]

keisen commented 3 years ago

This also still fails with 280868e:

@bersbersbers , Thank you for reporting! I could NOT find a way to avoid or fix the problem. So, for now, related testcase is skipped.

https://github.com/keisen/tf-keras-vis/blob/280868e1652bed0e0bbaee81c4e4c3ca32675478/tests/tf-keras-vis/activation_maximization/activation_maximization_test.py#L178-L187

Thanks!

keisen commented 3 years ago

@bersbersbers , If you can, please submit a PR that fix this issue. Thank you for your cooperation!

bersbersbers commented 3 years ago

Sure, very welcome. Regarding the problem in https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-831399112, I am pretty sure that I had it working either in v0.5.5 or in some earlier version of v0.6.0, probably with fixes I had proposed earlier. Have you tried https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-763819714? (Sorry these things are a bit all over the place, but I was not aware of this PR back then.)

keisen commented 3 years ago

Have you tried #45 (comment)?

Although #45 seems the other problem, can it solve this issue? If so, I'm looking forward to see [smarter ways to infer the proper dtype and maybe better places to cast] you said! (Considering maintenance, The way in https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-763819714 is not very intuitive. I'm glad that there is better way to do so.)

Thanks!

bersbersbers commented 3 years ago

Have you tried #45 (comment)?

Although #45 seems the other problem, can it solve this issue?

Yes, I just tried on top of 280868e. The reason that this is in #45 is because I got tired posting the same network example in various issues (#41, #43, #45, etc.) - each with one different visualization (GradCam, GradCam++, ScoreCam). So I justed posted a solution for the same network with Activation Maximization there.

If so, I'm looking forward to see [smarter ways to infer the proper dtype and maybe better places to cast] you said!

Well, I said I am sure there way, but I was not sure I know one. But here we go now.

(Considering maintenance, The way in #45 (comment) is not very intuitive. I'm glad that there is better way to do so.)

First, the first two changes from https://github.com/keisen/tf-keras-vis/issues/45#issuecomment-763819714 are now obsolete, so I focused on the last.

What do you think about this one - is this one more maintainable in your opinion? It addresses the key point that score_values and regularizations cannot be added due to different dtypes, using the is_compatible_with check designed for this purpose.

Of course, you can always cast the other way around (cast regularizations to score_dtype, or cast to float32 whatever is not float32 already) - I have no strong opinion on this and just chose the easiest way (score_values can be cast in a single operation). One might also make use of tf.experimental.numpy.result_type in the future, but that is still experimental.

diff --git a/tf_keras_vis/activation_maximization/__init__.py b/tf_keras_vis/activation_maximization/__init__.py
index b325d2c..cf5f456 100644
--- a/tf_keras_vis/activation_maximization/__init__.py
+++ b/tf_keras_vis/activation_maximization/__init__.py
@@ -139,6 +139,12 @@ class ActivationMaximization(ModelVisualization):
                 # Calculate regularization values
                 regularizations = [(regularizer.name, regularizer(seed_inputs))
                                    for regularizer in regularizers]
+
+                score_dtype = score_values[0].dtype
+                regularization_dtype = regularizations[0][1].dtype
+                if not score_dtype.is_compatible_with(regularization_dtype):
+                    score_values = tf.cast(score_values, regularization_dtype)
+
                 regularized_score_values = [
                     (-1. * score_value) + sum([v for _, v in regularizations])
                     for score_value in score_values
lines 1-17/17 (END)

bersbersbers commented 3 years ago

By the way, in 9b3d509a5c45d3c354ed650378939589841be41b you pinned scipy==1.4.* - is that necessary? Because I am now getting

plotnine 0.8.0 requires scipy>=1.5.0, but you have scipy 1.4.1 which is incompatible.

Edit: Also, pillow==7.1.* is over a year old.

Both of these old packages are not available (prebuilt) for Python 3.9, so installing 280868e on Python 3.9 fails for me due to missing "lapack/blas resources" (when compiling scipy). See https://github.com/scipy/scipy/issues/9005#issuecomment-623528512, but I do not have admin rights on my system to install the missing libraries. Note also that TF 2.5.0rc1 will support and is prebuilt for Python 3.9.

bersbersbers commented 3 years ago

Here's an example that fails in 0.6.0 (280868e) while it works in 0.5.5:

import tensorflow as tf
from tf_keras_vis.gradcam import Gradcam
Gradcam(model := tf.keras.applications.MobileNet())(
    lambda output: [o[0] for o in output],
    tf.zeros(model.input.shape[1:]),
)
# AttributeError: 'list' object has no attribute 'shape'

Edit: I have noticed that output is one Tensor now, so one can use output[:, 0]. For the sake of compatibility, however, I would say lists should be supported.

keisen commented 3 years ago

@bersbersbers , Thank you for your code-snippet. On the second thought, I will put this problem on hold for now. The design of ActivationMaximization related to regularizes has a known issue (that multiple I/O model is NOT considered fully), for now, I can't think of the way to fix problem concisely.

Thanks!

bersbersbers commented 3 years ago

On the second thought, I will put this problem on hold for now. The design of ActivationMaximization related to regularizes has a known issue (that multiple I/O model is NOT considered fully), for now, I can't think of the way to fix problem concisely.

You're the boss, but do know that I disagree:

Unless I am mistaken, the mixed-precision issue has nothing to do the multiple-I/O model. I find ignoring the fixed-precision issue while still allowing full-precision models to use a method that you know has difficulties somewhat questionable. Similarly, why not fix the mixed-precision issue and thereby allow mixed-precision models with single I/O to compute activation maximizations at least?
I also don't see the problem you have with the code snipped I posted. The mixed-precision issue is that two numbers (derived from different stages of the network, I guess) cannot be added without casting, so we cast them. Is there anything wrong with this approach?

keisen commented 3 years ago

You're the boss, but do know that I disagree:

@bersbersbers , Here is a open source project, so we only have fun and contribute it as possible as we can.

As I said before, unfortunately, I don't have a time enough to do all. So I decided , at least, v0.6.0 won't support that. I would do it in v0.7.0 or higher If I could find the time.

Or

@bersbersbers , If you can, please submit a PR that fix this issue.

As I said begore, because here is a open source, you can open and submit a PullRequest. I may include it to v0.6.0 if the PR merged soon.

Either way, I want to release v0.6.0 soon. Additionally, I want to keep tf-keras-vis's code clearly and concisely as possible as we can even if it were hard work to support mixed-precision.

Thanks!

bersbersbers commented 3 years ago

@bersbersbers , Here is a open source project, so we only have fun and contribute it as possible as we can.

Sure - I did not want to criticize anyone personally, just add my perspective on the issue.

@bersbersbers , If you can, please submit a PR that fix this issue.

I can easily submit https://github.com/keisen/tf-keras-vis/pull/39#issuecomment-831682563 as a PR, if that is your intention. I have tested it locally and it solves the issue.

If you are looking for something else, please let me know what you are looking for. To my first solution, you replied it wasn't "intuitive"; to my second solution, you reacted with a "confused" emoji; and that was all I got as a reply. Really, I am happy to contribute, but after declining two of my solutions you need to give me some additional criteria regarding what you think defines an acceptable solution.

bersbersbers commented 3 years ago

@keisen would you mind explaining what you think the bug in TensorFlow is?

https://github.com/keisen/tf-keras-vis/blob/47561f827b0ea89e0faab356ca3a9621b8662381/tests/tf-keras-vis/activation_maximization/activation_maximization_test.py#L188

I'd like to help isolate and report it upstream, but I don't see where TensorFlow is misbehaving.

Edit: as I see it, the problem is when you add score_values and regularizations, which are of different dtype. score_values is float32, while regularizations is float16. So what is the bug in your opinion?

The fact that score_values is float32? That is due to the model having float32 outputs, by definition of the model, right?
The fact that regularizations is float16? This is due to seed_inputs being float32 (which tf-keras-vis does itself).
The fact that you cannot add float16 to float32? I believe this is expected.

In summary, I don't see where you think the TF bug is.

bersbersbers commented 3 years ago

Here's another basic idea to fix this issue.

diff --git a/tf_keras_vis/activation_maximization/__init__.py b/tf_keras_vis/activation_maximization/__init__.py
index 134c52d..d8c31c5 100644
--- a/tf_keras_vis/activation_maximization/__init__.py
+++ b/tf_keras_vis/activation_maximization/__init__.py
@@ -118,6 +118,7 @@ class ActivationMaximization(ModelVisualization):
                 for modifier in input_modifiers[name]:
                     seed_inputs[j] = modifier(seed_inputs[j])

+            regularizer_seed_inputs = seed_inputs
             if mixed_precision_enabled:
                 seed_inputs = (tf.cast(X, dtype=lower_precision_dtype(self.model))
                                for X in seed_inputs)
@@ -130,7 +131,7 @@ class ActivationMaximization(ModelVisualization):
                 outputs = listify(outputs)
                 score_values = self._calculate_scores(outputs, scores)
                 # Calculate regularization values
-                regularizations = [(regularizer.name, regularizer(seed_inputs))
+                regularizations = [(regularizer.name, regularizer(regularizer_seed_inputs))
                                    for regularizer in regularizers]
                 regularized_score_values = [
                     (-1. * score_value) + sum([v for _, v in regularizations])

Why does this work? You save the original (float32) input for use with the regularizer, so it's not changed when you cast it to float16 later in case of mixed_precision. That gives you all the expected dtypes everywhere.

If you think it's more maintainable, you can also introduce lower_precision_seed_inputs and leave the original at float32.

keisen commented 3 years ago

@bersbersbers I'm sorry for the late reply!

If you are looking for something else, please let me know what you are looking for.

I believe that even if we fixed the error, some problems are still remain. Regularization values may be NaN. Many users may be confused because the results are different for ActivationMaximization with float32-precision and one with mixed-precision.

The reason of my reactions (emoji) are just because those ways can't fix the problem above. I can't decide whether to support mixed-precsion in ActivationMaximization is good or not. So I have no strong motivation to fully support mixed-precision in ActivationMaximization for now.

Thanks your contributions!

bersbersbers commented 3 years ago

I believe that even if we fixed the error, some problems are still remain.

That is true always and everywhere.

Regularization values may be NaN.

Is that specific to mixed_precision? That's as good an argument as the one with multi-I/O.

Many users may be confused because the results are different for ActivationMaximization with float32-precision and one with mixed-precision.

Well, for one, that is expected - these are different networks. Who would expect different networks to produce the same result?

Second, are users less confused when tf-keras-vis works with float32 and not with mixed-precision? Or when the maintainer references some TensorFlow bug that, frankly, I don't think exists?

Anyway, I will apply the changes mentioned above locally and be happy with them. Thanks and good luck!

keisen commented 3 years ago

Here's another basic idea to fix this issue.

It looks good. Is there any impact on calculation result by the patch?

bersbersbers commented 3 years ago

Here's another basic idea to fix this issue.

It looks good. Is there any impact on calculation result by the patch?

I cannot say really: I don't have any comparison as the code without this patch does not run for my saved mixed-precision models, so this is the only result that I have. And my models take so long to train that I cannot re-train them with full precision now.

I can say that the results I get are somewhat expected, but I was hoping that your testing pipeline could shed more light on the immediate comparison between full/mixed precision

keisen / tf-keras-vis

Release v0.6.0 #39