tensorflow / adanet

Fast and flexible AutoML with learning guarantees.
https://adanet.readthedocs.io
Apache License 2.0
3.47k stars 527 forks source link

Simple DNN generator error: 'Tensor' object is not callable #137

Open jeffltc opened 4 years ago

jeffltc commented 4 years ago

TensorFlow version: 2.0.0 Python version: 3.6 Mac OS

Hi there. I tried to run customized generator demo code on my local environment. But I came across " 'Tensor' object is not callable " problem.

I noticed this might be TensorFlow 2.0 problem. So I checked this issue and this issue and found out that the problem can be solved by using functools.partial() to pass callable function to optimizer. But I cannot find the source of loss and change that tensor into callable object.

In addition to that, is there a recommended version of TensorFlow version to run adanet code? There seems to be lot of things need to be tuned when I run adanet demo code with TensorFlow 2.0 environment.

Code is as following:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import functools

import adanet
from six.moves import range
import tensorflow as tf
import numpy as np

from time import time
from datetime import datetime

_NUM_LAYERS_KEY = "num_layers"

RANDOM_SEED=42

folder_dir =  "/Users/jeff/Desktop/autoensemble/" 
logdir_adanet = folder_dir + "adanet/" +datetime.now().strftime("%Y%m%d-%H%M%S")

#copy from the demo code
class _SimpleDNNBuilder(adanet.subnetwork.Builder):
  """Builds a DNN subnetwork for AdaNet."""

  def __init__(self, feature_columns, optimizer, layer_size, num_layers,
               learn_mixture_weights, dropout, seed):
    """Initializes a `_DNNBuilder`.
    Args:
      feature_columns: An iterable containing all the feature columns used by
        the model. All items in the set should be instances of classes derived
        from `FeatureColumn`.
      optimizer: An `Optimizer` instance for training both the subnetwork and
        the mixture weights.
      layer_size: The number of nodes to output at each hidden layer.
      num_layers: The number of hidden layers.
      learn_mixture_weights: Whether to solve a learning problem to find the
        best mixture weights, or use their default value according to the
        mixture weight type. When `False`, the subnetworks will return a no_op
        for the mixture weight train op.
      dropout: The dropout rate, between 0 and 1. E.g. "rate=0.1" would drop out
        10% of input units.
      seed: A random seed.
    Returns:
      An instance of `_DNNBuilder`.
    """

    self._feature_columns = feature_columns
    self._optimizer = optimizer
    self._layer_size = layer_size
    self._num_layers = num_layers
    self._learn_mixture_weights = learn_mixture_weights
    self._dropout = dropout
    self._seed = seed

  def build_subnetwork(self,
                       features,
                       logits_dimension,
                       training,
                       iteration_step,
                       summary,
                       previous_ensemble=None):
    """See `adanet.subnetwork.Builder`."""

    input_layer = tf.compat.v1.feature_column.input_layer(
        features=features, feature_columns=self._feature_columns)
    last_layer = input_layer
    for _ in range(self._num_layers):
      last_layer = tf.compat.v1.layers.dense(
          last_layer,
          units=self._layer_size,
          activation=tf.nn.relu,
          kernel_initializer=tf.compat.v1.glorot_uniform_initializer(
              seed=self._seed))
      last_layer = tf.compat.v1.layers.dropout(
          last_layer, rate=self._dropout, seed=self._seed, training=training)
    logits = tf.compat.v1.layers.dense(
        last_layer,
        units=logits_dimension,
        kernel_initializer=tf.compat.v1.glorot_uniform_initializer(
            seed=self._seed))

    # Approximate the Rademacher complexity of this subnetwork as the square-
    # root of its depth.
    complexity = tf.sqrt(tf.cast(self._num_layers, dtype=tf.float32))

    with tf.name_scope(""):
      summary.scalar("complexity", complexity)
      summary.scalar("num_layers", self._num_layers)

    shared = {_NUM_LAYERS_KEY: self._num_layers}
    return adanet.Subnetwork(
        last_layer=last_layer,
        logits=logits,
        complexity=complexity,
        shared=shared)

  def build_subnetwork_train_op(self, subnetwork, loss, var_list, labels,
                                iteration_step, summary, previous_ensemble):
    """See `adanet.subnetwork.Builder`."""

    # NOTE: The `adanet.Estimator` increments the global step.
    update_ops = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
      return self._optimizer.minimize(loss=loss, var_list=var_list)

  # TODO: Delete deprecated build_mixture_weights_train_op method.
  # Use adanet.ensemble.Ensembler instead.
  def build_mixture_weights_train_op(self, loss, var_list, logits, labels,
                                     iteration_step, summary):
    """See `adanet.subnetwork.Builder`."""

    if not self._learn_mixture_weights:
      return tf.no_op("mixture_weights_train_op")

    # NOTE: The `adanet.Estimator` increments the global step.
    return self._optimizer.minimize(loss=loss, var_list=var_list)

  @property
  def name(self):
    """See `adanet.subnetwork.Builder`."""

    if self._num_layers == 0:
      # A DNN with no hidden layers is a linear model.
      return "linear"
    return "{}_layer_dnn".format(self._num_layers)

class Generator(adanet.subnetwork.Generator):
  """Generates a two DNN subnetworks at each iteration.
  The first DNN has an identical shape to the most recently added subnetwork
  in `previous_ensemble`. The second has the same shape plus one more dense
  layer on top. This is similar to the adaptive network presented in Figure 2 of
  [Cortes et al. ICML 2017](https://arxiv.org/abs/1607.01097), without the
  connections to hidden layers of networks from previous iterations.
  """

  def __init__(self,
               feature_columns,
               optimizer,
               layer_size=32,
               initial_num_layers=0,
               learn_mixture_weights=False,
               dropout=0.,
               seed=None):
    """Initializes a DNN `Generator`.
    Args:
      feature_columns: An iterable containing all the feature columns used by
        DNN models. All items in the set should be instances of classes derived
        from `FeatureColumn`.
      optimizer: An `Optimizer` instance for training both the subnetwork and
        the mixture weights.
      layer_size: Number of nodes in each hidden layer of the subnetwork
        candidates. Note that this parameter is ignored in a DNN with no hidden
        layers.
      initial_num_layers: Minimum number of layers for each DNN subnetwork. At
        iteration 0, the subnetworks will be `initial_num_layers` deep.
        Subnetworks at subsequent iterations will be at least as deep.
      learn_mixture_weights: Whether to solve a learning problem to find the
        best mixture weights, or use their default value according to the
        mixture weight type. When `False`, the subnetworks will return a no_op
        for the mixture weight train op.
      dropout: The dropout rate, between 0 and 1. E.g. "rate=0.1" would drop out
        10% of input units.
      seed: A random seed.
    Returns:
      An instance of `Generator`.
    Raises:
      ValueError: If feature_columns is empty.
      ValueError: If layer_size < 1.
      ValueError: If initial_num_layers < 0.
    """

    if not feature_columns:
      raise ValueError("feature_columns must not be empty")

    if layer_size < 1:
      raise ValueError("layer_size must be >= 1")

    if initial_num_layers < 0:
      raise ValueError("initial_num_layers must be >= 0")

    self._initial_num_layers = initial_num_layers
    self._dnn_builder_fn = functools.partial(
        _SimpleDNNBuilder,
        feature_columns=feature_columns,
        optimizer=optimizer,
        layer_size=layer_size,
        learn_mixture_weights=learn_mixture_weights,
        dropout=dropout,
        seed=seed)

  def generate_candidates(self, previous_ensemble, iteration_number,
                          previous_ensemble_reports, all_reports):
    """See `adanet.subnetwork.Generator`."""

    num_layers = self._initial_num_layers
    if previous_ensemble:
      num_layers = previous_ensemble.weighted_subnetworks[-1].subnetwork.shared[
          _NUM_LAYERS_KEY]
    return [
        self._dnn_builder_fn(num_layers=num_layers),
        self._dnn_builder_fn(num_layers=num_layers + 1),
    ]

#import data
(x_train, y_train), (x_test, y_test) = (
      tf.keras.datasets.boston_housing.load_data())

#define input function
def input_fn(partition):
    def _input_fn():
        feat_tensor_dict = {}
        if partition == 'train':
            x = x_train.copy()
            y = y_train.copy()
        else:
            x = x_test.copy()
            y = y_test.copy()
        for i in range(0, np.size(x, 1)):
            feat_nam = ('feat' + str(i))
            feat_tensor_dict[feat_nam] = tf.convert_to_tensor(
                    x[:, i], dtype=tf.float32)
            label_tensor = tf.convert_to_tensor(y, dtype=tf.float32)
        return (feat_tensor_dict, label_tensor)
    return _input_fn

#@title AdaNet parameters
LEARNING_RATE = 0.001  #@param {type:"number"}
TRAIN_STEPS = 60000  #@param {type:"integer"}
BATCH_SIZE = 32  #@param {type:"integer"}
LEARN_MIXTURE_WEIGHTS = False  #@param {type:"boolean"}
ADANET_LAMBDA = 0  #@param {type:"number"}
ADANET_ITERATIONS = 3  #@param {type:"integer"}

#define column list
feat_nam_lst = ['feat' + str(i) for i in range(0, np.size(x_train, 1))]
feature_columns = []
for item in feat_nam_lst:
    feature_columns.append(tf.feature_column.numeric_column(item))

estimator = adanet.Estimator(
      # Since we are predicting housing prices, we'll use a regression
      # head that optimizes for MSE.
      head = tf.estimator.RegressionHead(1),

      # Define the generator, which defines our search space of subnetworks
      # to train as candidates to add to the final AdaNet model.
      subnetwork_generator=Generator(
          optimizer=tf.keras.optimizers.RMSprop(learning_rate=LEARNING_RATE),
          learn_mixture_weights=LEARN_MIXTURE_WEIGHTS,
          seed=RANDOM_SEED,
          feature_columns=feature_columns
          ),

      # Lambda is a the strength of complexity regularization. A larger
      # value will penalize more complex subnetworks.
      adanet_lambda=ADANET_LAMBDA,

      # The number of train steps per iteration.
      max_iteration_steps=TRAIN_STEPS // ADANET_ITERATIONS,

      # The evaluator will evaluate the model on the full training set to
      # compute the overall AdaNet loss (train loss + complexity
      # regularization) to select the best candidate to include in the
      # final AdaNet model.
      evaluator=adanet.Evaluator(
          input_fn=input_fn(partition ="train")),

      # Configuration for Estimators.
      config=tf.estimator.RunConfig(
          save_summary_steps=5000,
          save_checkpoints_steps=5000,
          tf_random_seed=RANDOM_SEED,
          model_dir=logdir_adanet))

#train and evaluation funciotn
train_spec = tf.estimator.TrainSpec(
      input_fn=input_fn("train"),
      max_steps=TRAIN_STEPS)
eval_spec = tf.estimator.EvalSpec(
      input_fn=input_fn("test"),
      steps=None,
      start_delay_secs=1,
      throttle_secs=30,
      )

#estimator train and evaluate
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

I apologize for the poor code style. I'm not professional in this line and I'm still working on it. Thanks for your attention.

cshjin commented 4 years ago

You use the optimizer from Keras API. The minimize function takes the loss as a callable function without arguments. If your loss is a calculated tensor, then use the tf.GradientTape() instead.

Replace the self._optimizer.minimize(...) with

with tf.GradientTape() as tape:
    # TODO: put your loss function in the tape scope
    _loss = loss()
grads = tape.gradient(_loss, var_list)
grads_and_vars = zip(grads, var_list)
self._optimizer.apply_gradients(grads_and_vars)

One of the ref I wrote in another question. https://github.com/tensorflow/tensorflow/issues/29944#issuecomment-560224083