quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.14k stars 382 forks source link

How to get input and output nodes in evaluation function during SVD compression. #586

Closed bwery closed 1 year ago

bwery commented 3 years ago

Environment: Ubuntu 18.04 LTS running in an LXD container - Tensorflow-gpu 1.15.0 - Python 3.6.9 - AIMET release 1.13.0 - running as root.

Hello!

I was trying to use the SVD compression on a classification network. I would like to improve the process providing an evaluation function.

I have built an evaluation function that performs inference using a "session.run" call, using information about the input and output op names from the original Keras session. This function works well outside the compression process.

When it is called back by the compression process, this routine fails with message

ValueError: Fetch argument <tf.Tensor 'dense_2/Softmax:0' shape=(?, 5) dtype=float32> cannot be interpreted as a Tensor. (Tensor Tensor("dense_2/Softmax:0", shape=(?, 5), dtype=float32) is not an element of this graph.)

dense_2/Softmax being the name of the output layer.

I feel that the compression process creates new session graphs during its processing and therefore the name of input and output ops I have identified in the original graph are not longer valid when the evaluation function is called.

Is there a way to determine the new ops that support input and output ops in these new sessions ?

It would be nice if identification of these ops be in the parameters transferred to the evaluation function.

Thank you for your help!

quic-ssiddego commented 3 years ago

Hi @bwery Thank you for your query. One thing to be taken care of here is - you will need to look up the tensor in the new session obtained after compression - using the new_sesssion.graph.get_tensor_by_name('tensor_name') api before performing new_session.run(). Could you please try this and let us know if he problem persists?

bwery commented 3 years ago

Hello !

I have tried to use "get_tensor_by_name", but thiss does not change the behaviour.

On the other hand, as I have started with Tensorflow 2, I am not very familiar with the structures of Tensorflow 1 and sessions. So, may be I an doing something wrong. I copy extracts of my code below.

My Keras network (a 5 class classifier, counting occurrences of a pattern) is defined by:

inputs = keras.layers.Input(shape=(ImageSize, ImageSize, 1))

Conv0 = keras.layers.Conv2D(16, kernel_size=11, strides=1, activation='relu') (inputs)
Pool0 = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2) (Conv0)
Conv1 = keras.layers.Conv2D(16, kernel_size=5, strides=1, activation='relu', padding='same') (Pool0)
Pool1 = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2) (Conv1)
Conv2 = keras.layers.Conv2D(32, kernel_size=3, strides=1, padding='same', activation='relu') (Pool1)
Conv3 = keras.layers.Conv2D(32, kernel_size=3, strides=1, padding='same', activation='relu') (Conv2)
Conv4 = keras.layers.Conv2D(8, kernel_size=3, strides=1, padding='same', activation='relu') (Conv3)
Pool4 = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2) (Conv4)
Flat4 = keras.layers.Flatten() (Pool4)
fc1 = keras.layers.Dense(64, activation='relu') (Flat4)
drop1 = keras.layers.Dropout(0.5)(fc1)
fc2 = keras.layers.Dense(128, activation='relu') (drop1)
drop2 = keras.layers.Dropout(0.5)(fc2)
Output = keras.layers.Dense(5, activation='softmax')(drop2)

CounterModel = Model(inputs=inputs, outputs=Output)

Once trained, I get the session with

tf.keras.backend.set_learning_phase(0)
keras_session = tf.compat.v1.keras.backend.get_session()

Note that in the examples in docs, I see you set a "1" for the learning phase, but in this case, I have parameters modified from gradient when I run the session. So this is the reason I placed a "0" here. Note that this does not have an impact on the "problem".

I don't know a way to determine the input and output operation names, except examining the graph with "get_operations". I have identified my input operation as "input_1", which is a placeholder and my output operation as "dense_2/softmax".

If I execute an inference with

output_op = keras_session.graph.get_operation_by_name('dense_2/Softmax')
output_tensor = output_op.outputs
keras_session.run(output_tensor[0], feed_dict={'input_1:0': input_data})

while input data is a tensor containing the input image, inference succeeds and produces the expected result.

I have built the following evaluation function.

def evaluate_model(sess: tf.compat.v1.Session, eval_iterations: int, use_cuda: bool):

    if (eval_iterations == None):
        iterations = 1
    else:
        iterations = eval_iterations

    Accumulator = 0;
    Count = 0;
    GeneratorBatches = ValidationBatchGenerator.__len__()

    for Index in range(iterations):
        BatchIndex = Index % GeneratorBatches
        EvaluationBatch = ValidationBatchGenerator.__getitem__(BatchIndex)

        for ImageIndex in range(ValidationBatchGenerator.BatchSize):
            InputData = EvaluationBatch[0][ImageIndex] 
            Label = EvaluationBatch[1][ImageIndex]
            InputImage = InputData.reshape(1, ImageSize, ImageSize, 1).astype(np.float32) 
            OutputTensor = sess.graph.get_tensor_by_name('dense_2/Softmax:0')       
            Prediction = sess.run(output_tensor[0], feed_dict={'input_1:0': InputImage})
            PredictArg = np.argmax(Prediction[0])
            LabelArg = np.argmax(Label)
            if (PredictArg == LabelArg):
                Accumulator += 1
            Count += 1

    return (Accumulator / Count)

If I run it with evaluate_model(keras_session, 2, True) It runs properly and provide the expected results.

Now when I try to use AIMET with the follwing code to compress the network.

conv2d_1 = keras_session.graph.get_operation_by_name('conv2d_1/Conv2D')
conv2d_2 = keras_session.graph.get_operation_by_name('conv2d_2/Conv2D')
conv2d_3 = keras_session.graph.get_operation_by_name('conv2d_3/Conv2D')
conv2d_4 = keras_session.graph.get_operation_by_name('conv2d_4/Conv2D')

greedy_params = GreedySelectionParameters(target_comp_ratio=Decimal(0.2),
                                          num_comp_ratio_candidates=10,
                                          use_monotonic_fit=True,
                                          saved_eval_scores_dict=None)

Manual_params = SpatialSvdParameters.ManualModeParams([ModuleCompRatioPair(module=conv2d_1, comp_ratio=0.5),
                                                       ModuleCompRatioPair(module=conv2d_2, comp_ratio=0.5),
                                                       ModuleCompRatioPair(module=conv2d_3, comp_ratio=0.5),
                                                       ModuleCompRatioPair(module=conv2d_4, comp_ratio=0.5)])

params = SpatialSvdParameters(input_op_names=['input_1'], output_op_names=['dense_2/Softmax'],
                              mode=SpatialSvdParameters.Mode.manual, params=Manual_params, multiplicity=8)

input_shape = (1, ImageSize, ImageSize, 1)

# Single call to compress the model
compr_model_sess, stats = ModelCompressor.compress_model(sess=keras_session,
                                                         working_dir=str('./'),
                                                         eval_callback=evaluate_model,
                                                         eval_iterations=10,
                                                         input_shape=input_shape,
                                                         compress_scheme=CompressionScheme.spatial_svd,
                                                         cost_metric=CostMetric.mac,
                                                         parameters=params,
                                                         trainer=None)

I get :

INFO:tensorflow:Restoring parameters from ./original_model
2021-05-07 08:01:17,791 - tensorflow - INFO - Restoring parameters from ./original_model
2021-05-07 08:01:17,830 - Svd - INFO - Spatial SVD splitting layer: conv2d_1/Conv2D using rank: 24
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/aimet_tensorflow/utils/op/conv.py:153: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Variable.assign which has equivalent behavior in 2.X.
2021-05-07 08:01:17,863 - tensorflow - WARNING - From /usr/local/lib/python3.6/dist-packages/aimet_tensorflow/utils/op/conv.py:153: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Variable.assign which has equivalent behavior in 2.X.
2021-05-07 08:01:17,928 - Svd - INFO - Spatial SVD splitting layer: conv2d_2/Conv2D using rank: 16
2021-05-07 08:01:18,018 - Svd - INFO - Spatial SVD splitting layer: conv2d_3/Conv2D using rank: 24
2021-05-07 08:01:18,112 - Svd - INFO - Spatial SVD splitting layer: conv2d_4/Conv2D using rank: 16
INFO:tensorflow:Restoring parameters from ./saver_2021-05-07_08:01:18.187266/temp
2021-05-07 08:01:18,318 - tensorflow - INFO - Restoring parameters from ./saver_2021-05-07_08:01:18.187266/temp

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in __init__(self, fetches, contraction_fn)
    304         self._unique_fetches.append(ops.get_default_graph().as_graph_element(
--> 305             fetch, allow_tensor=True, allow_operation=True))
    306       except TypeError as e:

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py in as_graph_element(self, obj, allow_tensor, allow_operation)
   3606     with self._lock:
-> 3607       return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
   3608 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py in _as_graph_element_locked(self, obj, allow_tensor, allow_operation)
   3685       if obj.graph is not self:
-> 3686         raise ValueError("Tensor %s is not an element of this graph." % obj)
   3687       return obj

ValueError: Tensor Tensor("dense_2/Softmax:0", shape=(?, 5), dtype=float32) is not an element of this graph.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-50-67a285c392a6> in <module>
      8                                                          cost_metric=CostMetric.mac,
      9                                                          parameters=params,
---> 10                                                          trainer=None)
     11 
     12 print(stats)    # Stats object can be pretty-printed easily

/usr/local/lib/python3.6/dist-packages/aimet_tensorflow/compress.py in compress_model(sess, working_dir, eval_callback, eval_iterations, input_shape, compress_scheme, cost_metric, parameters, trainer, visualization_url)
    107             raise ValueError("Compression scheme not supported: {}".format(compress_scheme))
    108 
--> 109         compressed_layer_db, stats = algo.compress_model(cost_metric, trainer)
    110 
    111         # TODO: this is a temporary fix, needs to be resolved

/usr/local/lib/python3.6/dist-packages/aimet_common/compression_algo.py in compress_model(self, cost_metric, trainer)
    102                                                        cost_metric, trainer)
    103         compressed_model_cost = self._cost_calculator.compute_model_cost(compressed_layer_db)
--> 104         stats = self._compile_stats(compressed_layer_db, compressed_model_cost, layer_comp_ratio_list, stats)
    105 
    106         return compressed_layer_db, stats

/usr/local/lib/python3.6/dist-packages/aimet_common/compression_algo.py in _compile_stats(self, compressed_layer_db, compressed_model_cost, layer_comp_ratio_list, compression_ratio_select_stats)
    118 
    119         # Baseline accuracy
--> 120         baseline_accuracy = self._eval_func(self._layer_db.model, None, self._use_cuda)
    121 
    122         # Compressed accuracy

/usr/local/lib/python3.6/dist-packages/aimet_tensorflow/utils/graph_saver.py in save_and_reload_tf_sess(*args, **kwargs)
    162 
    163         # returning the actual function now inside the wrapper function.
--> 164         return eval_func(*args, **kwargs)
    165 
    166     return save_and_reload_tf_sess

<ipython-input-46-f3404c482ac1> in evaluate_model(sess, eval_iterations, use_cuda)
     20             InputImage = InputData.reshape(1, ImageSize, ImageSize, 1).astype(np.float32)
     21             OutputTensor = sess.graph.get_tensor_by_name('dense_2/Softmax:0')
---> 22             Prediction = sess.run(output_tensor[0], feed_dict={'input_1:0': InputImage})
     23             PredictArg = np.argmax(Prediction[0])
     24             LabelArg = np.argmax(Label)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    954     try:
    955       result = self._run(None, fetches, feed_dict, options_ptr,
--> 956                          run_metadata_ptr)
    957       if run_metadata:
    958         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1163     # Create a fetch handler to take care of the structure of fetches.
   1164     fetch_handler = _FetchHandler(
-> 1165         self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
   1166 
   1167     # Run request and get response.

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in __init__(self, graph, fetches, feeds, feed_handles)
    472     """
    473     with graph.as_default():
--> 474       self._fetch_mapper = _FetchMapper.for_fetch(fetches)
    475     self._fetches = []
    476     self._targets = []

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in for_fetch(fetch)
    274         if isinstance(fetch, tensor_type):
    275           fetches, contraction_fn = fetch_fn(fetch)
--> 276           return _ElementFetchMapper(fetches, contraction_fn)
    277     # Did not find anything.
    278     raise TypeError('Fetch argument %r has invalid type %r' %

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in __init__(self, fetches, contraction_fn)
    310       except ValueError as e:
    311         raise ValueError('Fetch argument %r cannot be interpreted as a '
--> 312                          'Tensor. (%s)' % (fetch, str(e)))
    313       except KeyError as e:
    314         raise ValueError('Fetch argument %r cannot be interpreted as a '

ValueError: Fetch argument <tf.Tensor 'dense_2/Softmax:0' shape=(?, 5) dtype=float32> cannot be interpreted as a Tensor. (Tensor Tensor("dense_2/Softmax:0", shape=(?, 5), dtype=float32) is not an element of this graph.)

Thank you for your help !

quic-mangal commented 1 year ago

Closing this issue due to inactivity. Please re-open it/ create a new issue if you need further help.