tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.22k stars 1.77k forks source link

Mystery error building a mixnet #709

Open SapphireBrand opened 4 years ago

SapphireBrand commented 4 years ago

I copied the code below from the Mixnet model builder code, trying to use it as a "hello world" before adapting the builder to my own problem. This code should build a mixnet_s, IIUC. I am running in a Google Colab notebook, with Tensorflow 2.x

from mixnet import mixnet_model
from mixnet import mixnet_builder

blocks_args = [
      'r1_k3_a1_p1_s11_e1_i16_o16',
      'r1_k3_a1.1_p1.1_s22_e6_i16_o24',
      'r1_k3_a1.1_p1.1_s11_e3_i24_o24',

      'r1_k3.5.7_a1_p1_s22_e6_i24_o40_se0.5_sw',
      'r3_k3.5_a1.1_p1.1_s11_e6_i40_o40_se0.5_sw',

      'r1_k3.5.7_a1_p1.1_s22_e6_i40_o80_se0.25_sw',
      'r2_k3.5_a1_p1.1_s11_e6_i80_o80_se0.25_sw',

      'r1_k3.5.7_a1.1_p1.1_s11_e6_i80_o120_se0.5_sw',
      'r2_k3.5.7.9_a1.1_p1.1_s11_e3_i120_o120_se0.5_sw',

      'r1_k3.5.7.9.11_a1_p1_s22_e6_i120_o200_se0.5_sw',
      'r2_k3.5.7.9_a1_p1.1_s11_e6_i200_o200_se0.5_sw',
]
global_params = mixnet_model.GlobalParams(
    batch_norm_momentum=0.99,
    batch_norm_epsilon=1e-3,
    dropout_rate=0.2,
    data_format='channels_last',
    num_classes=1000,
    depth_multiplier=None,
    depth_divisor=8,
    min_depth=None,
    stem_size=16,
    use_keras=True,
    feature_size=1536)
decoder = mixnet_builder.MixnetDecoder()
blocks_args = decoder.decode(blocks_args)
model = mixnet_model.MixnetModel(blocks_args, global_params)
model.build(input_shape=(None, 224, 224, 3))

when I execute this code I get the output below. This happens when the network tries to reshape from 112x112x16 to 56x56x24. The error occurs on any layer that uses an "expand" parameter not equal to 1 (so the second "block arg" above is when this fails).

The line of code is pulling the channel dimension from the input shape (which is the integer 16) and then calling for the "value" attribute.

Not sure what I am doing wrong.

INFO:tensorflow:Built stem layers with output shape: (None, 112, 112, 16)
INFO:tensorflow:Block input: stem/Relu:0 shape: (None, 112, 112, 16)
INFO:tensorflow:Expand: stem/Relu:0 shape: (None, 112, 112, 16)
INFO:tensorflow:DWConv: blocks_0/Relu:0 shape: (None, 112, 112, 16)
INFO:tensorflow:Project: blocks_0/Add:0 shape: (None, 112, 112, 16)
INFO:tensorflow:Block input: blocks_0/Identity:0 shape: (None, 112, 112, 16)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-25f88c82da9c> in <module>()
     35 blocks_args = decoder.decode(blocks_args)
     36 model = mixnet_model.MixnetModel(blocks_args, global_params)
---> 37 model.build(input_shape=(None, 224, 224, 3))
     38 #model.summary()

3 frames
/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/network.py in build(self, input_shape)
    680                            'method accepts an `inputs` argument.')
    681         try:
--> 682           self.call(x, **kwargs)
    683         except (errors.InvalidArgumentError, TypeError):
    684           raise ValueError('You cannot build your model by calling `build` '

/content/tpu/models/official/mnasnet/mixnet/mixnet_model.py in call(self, inputs, training, features_only)
    407 
    408       with tf.variable_scope('blocks_%s' % idx):
--> 409         outputs = block.call(outputs, training=training)
    410         self.endpoints['block_%s' % idx] = outputs
    411         if is_reduction:

/content/tpu/models/official/mnasnet/mixnet/mixnet_model.py in call(self, inputs, training)
    251     tf.logging.info('Block input: %s shape: %s' % (inputs.name, inputs.shape))
    252     if self._block_args.expand_ratio != 1:
--> 253       x = self._relu_fn(self._bn0(self._expand_conv(inputs), training=training))
    254     else:
    255       x = inputs

/content/tpu/models/official/mnasnet/mixnet/custom_layers.py in __call__(self, inputs)
     70       return self._convs[0](inputs)
     71 
---> 72     filters = inputs.shape[self._channel_axis].value
     73     splits = _split_channels(filters, len(self._convs))
     74     x_splits = tf.split(inputs, splits, self._channel_axis)

AttributeError: 'int' object has no attribute 'value'
SapphireBrand commented 4 years ago

OK, so this code works if I change the line filters = inputs.shape[self._channel_axis].value to filters = inputs.shape[self._channel_axis].

Any idea why I need to make that change? I imagine this code is working as-is for some people, but something about the Colab environment differs?

Is there a change that I can push as a PR?