tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 321 forks source link

[clustering] Possible wrong implementation of get_weight_from_layer #799

Open tisma opened 3 years ago

tisma commented 3 years ago

Describe the bug Problem with custom layer weights clustering. When layer implements ClusterableLayer it should override get_clusterable_weights but later call of get_weights_from_layer causes AttributeError

System information

MMMMMMMMMMMMMMMMMMMMMMMMMmds+.        dellboy@thunderstruck 
MMm----::-://////////////oymNMd+`     --------------------- 
MMd      /++                -sNMd:    OS: Linux Mint 19.3 Tricia x86_64 
MMNso/`  dMM    `.::-. .-::.` .hMN:   Host: Z390 AORUS MASTER 
ddddMMh  dMM   :hNMNMNhNMNMNh: `NMm   Kernel: 5.4.0-81-generic 
    NMm  dMM  .NMN/-+MMM+-/NMN` dMM   Uptime: 12 hours, 21 mins 
    NMm  dMM  -MMm  `MMM   dMM. dMM   Packages: 4694 
    NMm  dMM  -MMm  `MMM   dMM. dMM   Shell: bash 4.4.20 
    NMm  dMM  .mmd  `mmm   yMM. dMM   Resolution: 3840x2160, 3840x2160 
    NMm  dMM`  ..`   ...   ydm. dMM   DE: Cinnamon 4.4.8 
    hMM- +MMd/-------...-:sdds  dMM   WM: Mutter (Muffin) 
    -NMm- :hNMNNNmdddddddddy/`  dMM   WM Theme: Linux Mint (Mint-Y-Dark) 
     -dMNs-``-::::-------.``    dMM   Theme: Mint-Y-Dark [GTK2/3] 
      `/dMNmy+/:-------------:/yMMM   Icons: Mint-Y [GTK2/3] 
         ./ydNMMMMMMMMMMMMMMMMMMMMM   Terminal: gnome-terminal 
            .MMMMMMMMMMMMMMMMMMM      CPU: Intel i9-9900K (16) @ 5.000GHz 
                                      GPU: NVIDIA GeForce GTX 1080 Ti 
                                      Memory: 23072MiB / 64320MiB 

TensorFlow version (installed from source or binary): Installed with pip, tensorflow-gpu 2.6.0

TensorFlow Model Optimization version (installed from source or binary): Installed with pip, tensorflow-model-optimization 0.6.0

Python version: Python 3.6.9

Describe the expected behavior

Describe the current behavior

Code to reproduce the issue

import numpy as np

import tensorflow as tf
from tensorflow import keras
import tensorflow_model_optimization as tfmot

cluster_weights = tfmot.clustering.keras.cluster_weights
CentroidInitialization = tfmot.clustering.keras.CentroidInitialization

clustering_params = {
  'number_of_clusters': 3,
  'cluster_centroids_init': CentroidInitialization.DENSITY_BASED
}

class MyCustomLayer(keras.layers.Layer, tfmot.clustering.keras.ClusterableLayer):
    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyCustomLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.kernel = self.add_weight(
            name = 'kernel',
            shape = (input_shape[1], self.output_dim),
            initializer = 'normal',
            trainable = True
        )
        super(MyCustomLayer, self).build(input_shape)

    def call(self, input_data):
        return keras.backend.dot(input_data, self.kernel)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

    def get_clusterable_weights(self):
        clusterable_weights = []
        for weight in self.trainable_weights:
            clusterable_weights.append((weight.name, weight.read_value()))
        return clusterable_weights

def get_model():
    # Create a simple model.
    model = keras.Sequential(
        [
            keras.Input(shape=(32,)),
            MyCustomLayer(32, input_shape=(32,)),
            keras.layers.Dense(2, activation="relu", name="layer1"),
            keras.layers.Dense(3, activation="relu", name="layer2"),
            keras.layers.Dense(4, name="layer3"),
        ]
    )

    model.compile(optimizer="adam", loss="mean_squared_error")
    return model

model = get_model()

# Train the model.
test_input = np.random.random((128, 32))
test_target = np.random.random((128, 1))
model.fit(test_input, test_target)
print(model.summary())

# Print all weights in model.
for weight in model.weights:
    print(weight.name)#, weight.read_value())

clustered_model = cluster_weights(model, **clustering_params)

clustered_model.summary(line_length=180, positions=[0.25, 0.60, 0.70, 1.0])

The output is:

(bug) dellboy@thunderstruck:~/git/tisma/tf-learn$ python simple_model.py 
2021-08-23 18:40:26.443329: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:26.465328: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:26.465628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:26.466063: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-23 18:40:26.466461: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:26.466787: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:26.467059: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:27.063134: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:27.063747: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:27.064148: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-23 18:40:27.064580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6866 MB memory:  -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
2021-08-23 18:40:27.214358: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
4/4 [==============================] - 0s 664us/step - loss: 0.2719
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
my_custom_layer (MyCustomLay (None, 32)                1024      
_________________________________________________________________
layer1 (Dense)               (None, 2)                 66        
_________________________________________________________________
layer2 (Dense)               (None, 3)                 9         
_________________________________________________________________
layer3 (Dense)               (None, 4)                 16        
=================================================================
Total params: 1,115
Trainable params: 1,115
Non-trainable params: 0
_________________________________________________________________
None
my_custom_layer/kernel:0
layer1/kernel:0
layer1/bias:0
layer2/kernel:0
layer2/bias:0
layer3/kernel:0
layer3/bias:0
Traceback (most recent call last):
  File "simple_model.py", line 69, in <module>
    clustered_model = cluster_weights(model, **clustering_params)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/clustering/keras/cluster.py", line 133, in cluster_weights
    **kwargs)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/clustering/keras/cluster.py", line 261, in _cluster_weights
    to_cluster, input_tensors=None, clone_function=_add_clustering_wrapper)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/models.py", line 449, in clone_model
    model, input_tensors=input_tensors, layer_fn=clone_function)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/models.py", line 332, in _clone_sequential_model
    cloned_model = Sequential(layers=layers, name=model.name)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 530, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/engine/sequential.py", line 134, in __init__
    self.add(layer)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 530, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/engine/sequential.py", line 217, in add
    output_tensor = layer(self.outputs[0])
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/engine/base_layer.py", line 977, in __call__
    input_list)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/engine/base_layer.py", line 1115, in _functional_construction_call
    inputs, input_masks, args, kwargs)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/engine/base_layer.py", line 848, in _keras_tensor_symbolic_call
    return self._infer_output_signature(inputs, args, kwargs, input_masks)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/engine/base_layer.py", line 886, in _infer_output_signature
    self._maybe_build(inputs)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/keras/engine/base_layer.py", line 2659, in _maybe_build
    self.build(input_shapes)  # pylint:disable=not-callable
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/clustering/keras/cluster_wrapper.py", line 160, in build
    original_weight = self.get_weight_from_layer(weight_name)
  File "/home/dellboy/git/tisma/tf-learn/bug/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/clustering/keras/cluster_wrapper.py", line 146, in get_weight_from_layer
    return getattr(self.layer, weight_name)
AttributeError: 'MyCustomLayer' object has no attribute 'my_custom_layer/kernel:0'

But if I run this snippet:

# Print all weights in model.
for weight in model.weights:
    print(weight.name)#, weight.read_value())

print([layer.name for layer in model.layers])

# Example weight that should return from model.
weight_name = "my_custom_layer/kernel:0"

# This is correct way for getting it (I set layers[0] for example)
for weight in model.layers[0].weights:
    if weight.name == weight_name:
        print("FOUND WEIGHT: ", weight.name, weight.read_value())

It will found the weight:

my_custom_layer/kernel:0
layer1/kernel:0
layer1/bias:0
layer2/kernel:0
layer2/bias:0
layer3/kernel:0
layer3/bias:0
['my_custom_layer', 'layer1', 'layer2', 'layer3']
FOUND WEIGHT:  my_custom_layer/kernel:0 tf.Tensor(
[[-0.01710013  0.08641329  0.00445064 ... -0.00947034 -0.02543414
  -0.02332742]
 [-0.0146245  -0.002941   -0.01422382 ...  0.02857029 -0.04331051
  -0.00299862]
 [-0.07127763  0.07367716 -0.06753001 ... -0.06001836  0.04888764
   0.1081293 ]
 ...
 [ 0.04297659  0.0334582  -0.09708535 ...  0.00098922  0.05463797
  -0.0092663 ]
 [ 0.03690836 -0.061338    0.01662921 ... -0.03843782 -0.08734126
   0.00209901]
 [ 0.10478324  0.07971404  0.05170573 ...  0.05777165 -0.08564453
   0.04021074]], shape=(32, 32), dtype=float32)

My assumption is that implementation of get_weight_from_layer(self, weight_name): https://github.com/tensorflow/model-optimization/blob/18e87d262e536c9a742aef700880e71b47a7f768/tensorflow_model_optimization/python/core/clustering/keras/cluster_wrapper.py#L144-L145 is incorrect.

fredrec commented 3 years ago

Hi @tisma,

Thanks for your report and especially for the code to reproduce the issue. We are looking into this.

daverim commented 3 years ago

Hi @wwwind, do you think you can take a look at this issue?

wwwind commented 3 years ago

Hi @tisma, The function get_clusterable_weights in your implementation does not return what is expected by the clustering algorithm. It should be

    def get_clusterable_weights(self):
        return [('kernel', self.kernel)]

We have a tutorial for ClusterableLayer here.

tisma commented 3 years ago

Oh, I see the problem in my implementation of get_clusterable_weights

def get_clusterable_weights(self):
        clusterable_weights = []
        for weight in self.trainable_weights:
            clusterable_weights.append((weight.name, weight.read_value()))
        return clusterable_weights

First problem, return value of weight.read_value() is of type <class 'tensorflow.python.framework.ops.EagerTensor'> instead of <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable'> so it should be replaced just by weight in that tuple creation.

Second trickier problem is that weight.name is not simply just the name of the variable, but it contains the layer name prefix and :0 at the end (eg. my_custom_layer/kernel:0). So if I have to specify names manually or if I want to use more generic way for adding clusterable weights I'll have to do something like this

weight.name[weight.name.find("/") + 1 : weight.name.find(":")]

to get just variable name.

tisma commented 3 years ago

@wwwind This was just a simple example of the model. What if I have a more complex model which is composed of several layers? How can I add those weights that are part of the nested layers to the clustered_weights list []? This is the output of the model.summary() and all the weights that are part of stereo_net layer

# layers[9] -> StereoNet
for weight in model.layers[9].weights:
    print(weight.name)
Model: "model"
________________________________________________________________________________________________________________________
Layer (type)                  Output Shape                              Param #     Connected to                        
========================================================================================================================
Left (InputLayer)             [(None, None, None, 3)]                   0                                               
________________________________________________________________________________________________________________________
Right (InputLayer)            [(None, None, None, 3)]                   0                                               
________________________________________________________________________________________________________________________
cam_fx (InputLayer)           [(None, 1)]                               0                                               
________________________________________________________________________________________________________________________
cam_baseline (InputLayer)     [(None, 1)]                               0                                               
________________________________________________________________________________________________________________________
cam_proj_l (InputLayer)       [(None, 3, 4)]                            0                                               
________________________________________________________________________________________________________________________
cam_proj_r (InputLayer)       [(None, 3, 4)]                            0                                               
________________________________________________________________________________________________________________________
targets (InputLayer)          [(None, None, None)]                      0                                               
________________________________________________________________________________________________________________________
ious (InputLayer)             [(None, None, None)]                      0                                               
________________________________________________________________________________________________________________________
labels_map (InputLayer)       [(None, None, None)]                      0                                               
________________________________________________________________________________________________________________________
stereo_net (StereoNet)        [(None, None, None), (2, 233472, 6), (2,  4085752     Left[0][0]                          
                                                                                    Right[0][0]                         
                                                                                    cam_fx[0][0]                        
                                                                                    cam_baseline[0][0]                  
                                                                                    cam_proj_l[0][0]                    
                                                                                    cam_proj_r[0][0]                    
                                                                                    targets[0][0]                       
                                                                                    ious[0][0]                          
                                                                                    labels_map[0][0]                    
________________________________________________________________________________________________________________________
depth (PassThrough)           (None, None, None)                        0           stereo_net[0][0]                    
________________________________________________________________________________________________________________________
bbox_cls (PassThrough)        (2, 233472, 6)                            0           stereo_net[0][1]                    
________________________________________________________________________________________________________________________
bbox_reg (PassThrough)        (2, 10, 3, 6)                             0           stereo_net[0][2]                    
________________________________________________________________________________________________________________________
bbox_centerness (PassThrough) (2, 233472, 6)                            0           stereo_net[0][3]                    
========================================================================================================================
Total params: 4,085,752
Trainable params: 4,085,480
Non-trainable params: 272
________________________________________________________________________________________________________________________
conv2d/kernel:0
batch_normalization/gamma:0
batch_normalization/beta:0
conv2d_1/kernel:0
batch_normalization_1/gamma:0
batch_normalization_1/beta:0
conv2d_2/kernel:0
batch_normalization_2/gamma:0
batch_normalization_2/beta:0
conv2d_4/kernel:0
batch_normalization_4/gamma:0
batch_normalization_4/beta:0
conv2d_5/kernel:0
batch_normalization_5/gamma:0
batch_normalization_5/beta:0
conv2d_3/kernel:0
batch_normalization_3/gamma:0
batch_normalization_3/beta:0
conv2d_6/kernel:0
batch_normalization_6/gamma:0
batch_normalization_6/beta:0
conv2d_7/kernel:0
batch_normalization_7/gamma:0
batch_normalization_7/beta:0
conv2d_8/kernel:0
batch_normalization_8/gamma:0
batch_normalization_8/beta:0
conv2d_9/kernel:0
batch_normalization_9/gamma:0
batch_normalization_9/beta:0
conv2d_11/kernel:0
group_normalization_1/gamma:0
group_normalization_1/beta:0
conv2d_12/kernel:0
group_normalization_2/gamma:0
group_normalization_2/beta:0
conv2d_10/kernel:0
group_normalization/gamma:0
group_normalization/beta:0
conv2d_13/kernel:0
group_normalization_3/gamma:0
group_normalization_3/beta:0
conv2d_14/kernel:0
group_normalization_4/gamma:0
group_normalization_4/beta:0
conv2d_15/kernel:0
group_normalization_5/gamma:0
group_normalization_5/beta:0
conv2d_16/kernel:0
group_normalization_6/gamma:0
group_normalization_6/beta:0
conv2d_17/kernel:0
group_normalization_7/gamma:0
group_normalization_7/beta:0
conv2d_18/kernel:0
group_normalization_8/gamma:0
group_normalization_8/beta:0
conv2d_20/kernel:0
group_normalization_10/gamma:0
group_normalization_10/beta:0
conv2d_21/kernel:0
group_normalization_11/gamma:0
group_normalization_11/beta:0
conv2d_19/kernel:0
group_normalization_9/gamma:0
group_normalization_9/beta:0
conv2d_22/kernel:0
group_normalization_12/gamma:0
group_normalization_12/beta:0
conv2d_23/kernel:0
group_normalization_13/gamma:0
group_normalization_13/beta:0
conv2d_24/kernel:0
group_normalization_14/gamma:0
group_normalization_14/beta:0
conv2d_25/kernel:0
group_normalization_15/gamma:0
group_normalization_15/beta:0
conv2d_26/kernel:0
group_normalization_16/gamma:0
group_normalization_16/beta:0
conv2d_27/kernel:0
group_normalization_17/gamma:0
group_normalization_17/beta:0
conv2d_28/kernel:0
group_normalization_18/gamma:0
group_normalization_18/beta:0
conv2d_29/kernel:0
group_normalization_19/gamma:0
group_normalization_19/beta:0
conv2d_30/kernel:0
group_normalization_20/gamma:0
group_normalization_20/beta:0
conv2d_31/kernel:0
group_normalization_21/gamma:0
group_normalization_21/beta:0
conv2d_33/kernel:0
group_normalization_23/gamma:0
group_normalization_23/beta:0
conv2d_34/kernel:0
group_normalization_24/gamma:0
group_normalization_24/beta:0
conv2d_32/kernel:0
group_normalization_22/gamma:0
group_normalization_22/beta:0
conv2d_35/kernel:0
group_normalization_25/gamma:0
group_normalization_25/beta:0
conv2d_36/kernel:0
group_normalization_26/gamma:0
group_normalization_26/beta:0
conv2d_37/kernel:0
group_normalization_27/gamma:0
group_normalization_27/beta:0
conv2d_38/kernel:0
group_normalization_28/gamma:0
group_normalization_28/beta:0
conv2d_39/kernel:0
group_normalization_29/gamma:0
group_normalization_29/beta:0
conv2d_40/kernel:0
group_normalization_30/gamma:0
group_normalization_30/beta:0
conv2d_41/kernel:0
group_normalization_31/gamma:0
group_normalization_31/beta:0
conv2d_42/kernel:0
group_normalization_32/gamma:0
group_normalization_32/beta:0
conv2d_43/kernel:0
group_normalization_33/gamma:0
group_normalization_33/beta:0
conv2d_44/kernel:0
conv3d/kernel:0
group_normalization_34/gamma:0
group_normalization_34/beta:0
conv3d_1/kernel:0
group_normalization_35/gamma:0
group_normalization_35/beta:0
conv3d_2/kernel:0
group_normalization_36/gamma:0
group_normalization_36/beta:0
conv3d_3/kernel:0
group_normalization_37/gamma:0
group_normalization_37/beta:0
conv3d_transpose/kernel:0
group_normalization_38/gamma:0
group_normalization_38/beta:0
conv3d_transpose_1/kernel:0
group_normalization_39/gamma:0
group_normalization_39/beta:0
conv3d_4/kernel:0
group_normalization_40/gamma:0
group_normalization_40/beta:0
conv3d_5/kernel:0
batch_normalization/moving_mean:0
batch_normalization/moving_variance:0
batch_normalization_1/moving_mean:0
batch_normalization_1/moving_variance:0
batch_normalization_2/moving_mean:0
batch_normalization_2/moving_variance:0
batch_normalization_4/moving_mean:0
batch_normalization_4/moving_variance:0
batch_normalization_5/moving_mean:0
batch_normalization_5/moving_variance:0
batch_normalization_3/moving_mean:0
batch_normalization_3/moving_variance:0
batch_normalization_6/moving_mean:0
batch_normalization_6/moving_variance:0
batch_normalization_7/moving_mean:0
batch_normalization_7/moving_variance:0
batch_normalization_8/moving_mean:0
batch_normalization_8/moving_variance:0
batch_normalization_9/moving_mean:0
batch_normalization_9/moving_variance:0
print(dir(model.layers[9]))
['CV_X_MAX', 'CV_X_MIN', 'CV_Y_MAX', 'CV_Y_MIN', 'CV_Z_MAX', 'CV_Z_MIN', 'GRID_SIZE', 'RPN3D_INPUT_DIM', 'VOXEL_X_SIZE', 'VOXEL_Y_SIZE', 'VOXEL_Z_SIZE', 'X_MAX', 'X_MIN', 'Y_MAX', 'Y_MIN', 'Z_MAX', 'Z_MIN', '_TF_MODULE_IGNORED_PROPERTIES', '__abstractmethods__', '__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '_abc_registry', '_activity_regularizer', '_add_trackable', '_add_variable_with_custom_getter', '_auto_track_sub_layers', '_autocast', '_autographed_call', '_build_input_shape', '_call_accepts_kwargs', '_call_arg_was_passed', '_call_fn_arg_defaults', '_call_fn_arg_positions', '_call_fn_args', '_call_full_argspec', '_callable_losses', '_cast_single_input', '_checkpoint_dependencies', '_clear_losses', '_compute_dtype', '_compute_dtype_object', '_dedup_weights', '_default_training_arg', '_deferred_dependencies', '_dtype', '_dtype_policy', '_dynamic', '_eager_losses', '_expects_mask_arg', '_expects_training_arg', '_flatten', '_flatten_layers', '_functional_construction_call', '_gather_children_attribute', '_gather_saveables_for_checkpoint', '_get_call_arg_value', '_get_existing_metric', '_get_input_masks', '_get_node_attribute_at_index', '_get_save_spec', '_get_trainable_state', '_handle_activity_regularization', '_handle_deferred_dependencies', '_handle_weight_regularization', '_inbound_nodes', '_inbound_nodes_value', '_infer_output_signature', '_init_call_fn_args', '_init_set_name', '_initial_weights', '_input_spec', '_instrument_layer_creation', '_instrumented_keras_api', '_instrumented_keras_layer_class', '_instrumented_keras_model_class', '_is_layer', '_keras_api_names', '_keras_api_names_v1', '_keras_tensor_symbolic_call', '_layers', '_list_extra_dependencies_for_serialization', '_list_functions_for_serialization', '_lookup_dependency', '_losses', '_map_resources', '_maybe_build', '_maybe_cast_inputs', '_maybe_create_attribute', '_maybe_initialize_trackable', '_metrics', '_metrics_lock', '_must_restore_from_config', '_name', '_name_based_attribute_restore', '_name_based_restores', '_name_scope', '_no_dependency', '_non_trainable_weights', '_obj_reference_counts', '_obj_reference_counts_dict', '_object_identifier', '_outbound_nodes', '_outbound_nodes_value', '_preload_simple_restoration', '_preserve_input_structure_in_config', '_restore_from_checkpoint_position', '_saved_model_inputs_spec', '_self_name_based_restores', '_self_saveable_object_factories', '_self_setattr_tracking', '_self_unconditional_checkpoint_dependencies', '_self_unconditional_deferred_dependencies', '_self_unconditional_dependency_names', '_self_update_uid', '_set_call_arg_value', '_set_connectivity_metadata', '_set_dtype_policy', '_set_mask_keras_history_checked', '_set_mask_metadata', '_set_save_spec', '_set_trainable_state', '_set_training_mode', '_setattr_tracking', '_should_cast_single_input', '_single_restoration_from_checkpoint_position', '_split_out_first_arg', '_stateful', '_supports_masking', '_symbolic_call', '_tf_api_names', '_tf_api_names_v1', '_thread_local', '_track_trackable', '_trackable_saved_model_saver', '_tracking_metadata', '_trainable', '_trainable_weights', '_unconditional_checkpoint_dependencies', '_unconditional_dependency_names', '_update_uid', '_updates', 'activity_regularizer', 'add_loss', 'add_metric', 'add_update', 'add_variable', 'add_weight', 'anchor_angles', 'apply', 'box_corner_parameters', 'build', 'built', 'call', 'cat_disp', 'cat_img_feature', 'cat_right_img_feature', 'centerness4class', 'cfg', 'class4angles', 'classif1', 'compute_dtype', 'compute_mask', 'compute_output_shape', 'compute_output_signature', 'coord_rect', 'count_params', 'dispregression', 'downsample_disp', 'dres0', 'dtype', 'dtype_policy', 'dynamic', 'feature_extraction', 'fix_centerness_bug', 'from_config', 'get_clusterable_algorithm', 'get_clusterable_weights', 'get_config', 'get_input_at', 'get_input_mask_at', 'get_input_shape_at', 'get_losses_for', 'get_output_at', 'get_output_mask_at', 'get_output_shape_at', 'get_updates_for', 'get_weights', 'hg_cv', 'hg_firstconv', 'hg_rpn_conv', 'hg_rpn_conv3d', 'img_feature_attentionbydisp', 'inbound_nodes', 'input', 'input_mask', 'input_shape', 'input_spec', 'losses', 'maxdisp', 'metrics', 'name', 'name_scope', 'non_trainable_variables', 'non_trainable_weights', 'num_3dconvs', 'num_angles', 'num_classes', 'num_convs', 'outbound_nodes', 'output', 'output_mask', 'output_shape', 'rpn3d_conv_kernel', 'set_weights', 'stateful', 'submodules', 'supports_masking', 'trainable', 'trainable_variables', 'trainable_weights', 'updates', 'upsample0', 'valid_classes', 'variable_dtype', 'variables', 'voxel_attentionbydisp', 'weights', 'with_name_scope']
wwwind commented 3 years ago

Hi @tisma To pass to the clustering algorithm what should be clustered in your layer, you need to take a look at attributes with weights. This is an advanced usage, so it might be not so convenient, but I would put a breakpoint to see where weights are stored. For example, for MHA layer we have 4 types of weights. To pass them for clustering I would re-define my function like this:

def get_clusterable_weights_mha():   
  return [('_query_dense.kernel', layer._query_dense.kernel),
          ('_key_dense.kernel', layer._key_dense.kernel),
          ('_value_dense.kernel', layer._value_dense.kernel),
          ('_output_dense.kernel', layer._output_dense.kernel)]