Combination of subnetworks

JodyZXD commented 5 years ago

In Google's paper, "each unit in layer k of the subnetwork may have connections to existing units in layer k-1 if AdaNet", but the GIF shows differently. How do the subnetwoks combine exactly?

cweill commented 5 years ago

@JodyZXD: Good observation. The GIF just demos a simple example. The AdaNet framework supports a superset of the subnetworks and connections that the paper defines. You can recreate the neural network from the paper by passing the hidden_layer outputs through the Subnetwork.persisted_tensors dict. This will make these tensors available across iterations.

For instance, you can do something like the following change to simple_dnn.py:

def build_subnetwork(self,
                      features,
                      logits_dimension,
                      training,
                      iteration_step,
                      summary,
                      previous_ensemble):
    """See `adanet.subnetwork.Builder`."""

    input_layer = tf.feature_column.input_layer(
        features=features, feature_columns=self._feature_columns)
    last_layer = input_layer
    persisted_tensors = {_NUM_LAYERS_KEY: tf.constant(self._num_layers)}
    for i in range(self._num_layers):
      last_layer = tf.layers.dense(
          last_layer,
          units=self._layer_size,
          activation=tf.nn.relu,
          kernel_initializer=tf.glorot_uniform_initializer(seed=self._seed))
      last_layer = tf.layers.dropout(
          last_layer, rate=self._dropout, seed=self._seed, training=training)
      hidden_layer_key = "hidden_layer_{}".format(i)
      if previous_ensemble:
          # Iteration t>0.
          last_subnetwork = previous_ensemble.weighted_subnetworks[-1].subnetwork
          last_layer = tf.concat([last_subnetwork.persisted_tensors[hidden_layer_key]), last_layer], axis=1)
      # Store hidden layer outputs for subsequent iterations.
      persisted_tensors[hidden_layer_key] = last_layer
    logits = tf.layers.dense(
        last_layer,
        units=logits_dimension,
        kernel_initializer=tf.glorot_uniform_initializer(seed=self._seed))

    # Approximate the Rademacher complexity of this subnetwork as the square-
    # root of its depth.
    complexity = tf.sqrt(tf.to_float(self._num_layers))

    with tf.name_scope(""):
      summary.scalar("complexity", complexity)
      summary.scalar("num_layers", self._num_layers)

    return adanet.Subnetwork(
        last_layer=last_layer,
        logits=logits,
        complexity=complexity,
        persisted_tensors=persisted_tensors)

JodyZXD commented 5 years ago

@cweill Thanks a lot! It's instructive for me! I have another question here: will you give a GPU guide before long?

cweill commented 5 years ago

AdaNet works on GPU just like any other TensorFlow Estimator. Any guide on the web to do GPU training should get you started.

You can also try GPU on Colab by changing the runtime hardware to GPU:

https://colab.research.google.com/github/tensorflow/adanet/blob/master/adanet/examples/tutorials/customizing_adanet.ipynb

tobymu commented 5 years ago

@cweill Thanks for your great work! If I want to combine simple cnn and simple dnn together, how could I do? For example, I want to get a network "2_layer_dnn -> cnn -> cnn”

martinobertoni commented 5 years ago

@cweil Thanks for the insight on building network! However I've tested the build_subnetwork code you posted above (plugging it into the SimpleDNNBuilder) and it does not work. I'm not getting how to exactly sample the network structural space as described in the paper. Any indication or additional piece of code would be most welcome!

martinobertoni commented 5 years ago

An alternative approach can be the following builder

    def build_subnetwork(self,
                         features,
                         logits_dimension,
                         training,
                         iteration_step,
                         summary,
                         previous_ensemble=None):
        """See `adanet.subnetwork.Builder`."""
        input_layer = tf.to_float(features['x'])
        kernel_initializer = tf.glorot_uniform_initializer(seed=self._seed)
        last_layer = input_layer
        for layer_size in self._layer_sizes:
            last_layer = tf.layers.dense(
                last_layer,
                units=layer_size,
                activation=self._activation,
                kernel_initializer=kernel_initializer)
        logits = tf.layers.dense(
            last_layer,
            units=logits_dimension,
            kernel_initializer=kernel_initializer)

        persisted_tensors = {
            "num_layers": tf.constant(self._num_layers),
            "layer_sizes": tf.constant(self._layer_sizes),
        }
        return adanet.Subnetwork(
            last_layer=last_layer,
            logits=logits,
            complexity=self._measure_complexity(),
            persisted_tensors=persisted_tensors)

and move the logic for exploring in the generator

    def generate_candidates(self, previous_ensemble, iteration_number,
                            previous_ensemble_reports, all_reports):
        """See `adanet.subnetwork.Generator`."""
        seed = self._seed
        if seed is not None:
            seed += iteration_number
        # start with single layer
        num_layers = 1
        layer_sizes = [self.layer_block_size]
        # take the maximum depth reached in previous iterations + 1
        if previous_ensemble:
            last_subnetwork = previous_ensemble.weighted_subnetworks[
                -1].subnetwork
            persisted_tensors = last_subnetwork.persisted_tensors
            num_layers = tf.contrib.util.constant_value(
                persisted_tensors["num_layers"])
            layer_sizes = list(tf.contrib.util.constant_value(
                persisted_tensors["layer_sizes"]))
        # at each iteration we want to check if exdending any of the
        # existing layes is good
        candidates = list()
        for extend_layer in range(num_layers):
            new_sizes = layer_sizes[:]
            new_sizes[extend_layer] += self.layer_block_size
            candidates.append(
                self._dnn_builder_fn(
                    num_layers=num_layers,
                    layer_sizes=new_sizes,
                    seed=seed,
                    previous_ensemble=previous_ensemble))
        # also check if it's worth adding a new layer
        candidates.append(
            self._dnn_builder_fn(
                num_layers=num_layers + 1,
                layer_sizes=layer_sizes + [self.layer_block_size],
                seed=seed,
                previous_ensemble=previous_ensemble))
        # also keep the un-extended candidate
        candidates.append(
            self._dnn_builder_fn(
                num_layers=num_layers,
                layer_sizes=layer_sizes,
                seed=seed,
                previous_ensemble=previous_ensemble))
        return candidates

@cweil what do you think?

InkdyeHuang commented 5 years ago

for t iteration it concat with the previous network ,whether the weight is frozen or not

tensorflow / adanet

Combination of subnetworks #24