tensorflow / models

Models and examples built with TensorFlow
Other
77.03k stars 45.77k forks source link

ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [7744,512] #3393

Closed digiamm closed 6 years ago

digiamm commented 6 years ago

Hi guys, I am a beginner with TF and I am trying to running some Atari Reinforcement Learning training on my laptop with GeForce GT 650M. I get the following error and I can't figure out what's wrong, I've tried to change my batch size multiple times but same again. Sometimes stop with this error and sometimes just quit after 300 steps without any error message.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [7744,512] and type float
     [[Node: q_estimator/q_estimator/fully_connected/weights/RMSProp_1/Initializer/zeros = Const[_class=["loc:@q_estimator/fully_connected/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [7744,512] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

This is my code.. I am embedding all of it so it will be clear.

import gym
from gym.wrappers import Monitor
import itertools
import numpy as np
import os
import random
import sys
import psutil
import tensorflow as tf

if "../" not in sys.path:
  sys.path.append("../")

from lib import plotting
from collections import deque, namedtuple

env = gym.envs.make("Breakout-v0")

# atari actions: 0 [noop], 1 [fire], 2 [left] and 3 [right]
VALID_ACTIONS = [0, 1, 2, 3]

class StateProcessor():
  """
  Process a raw Atari image, resizes it and converts it to grayscale.
  """
  def __init__(self):
    # build the tensorflow graph
    with tf.variable_scope("state_processor"):
      # obtain input image
      self.input_state = tf.placeholder(shape=[210,160,3], dtype=tf.uint8)
      # convert to greyscale
      self.output = tf.image.rgb_to_grayscale(self.input_state)
      # crop image
      self.output = tf.image.crop_to_bounding_box(self.output, 34, 0 , 160, 160)
      # resize image
      self.output = tf.image.resize_images(self.output, [84, 84], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
      # removes all dimensions of size 1
      self.output = tf.squeeze(self.output)

  def process(self, sess, state):
    """
    :param sess: tensorflow session object
    :param state: [210, 160, 3] atari RGB State
    :return: processed [84, 84, 1] state representing grayscale values
    """
    return sess.run(self.output, {self.input_state: state})

class Estimator():
  """
  Q-Value Estimator neural network

  This network is used for both the Q-Network and the Target Network
  """

  def __init__(self, scope="estimator", summaries_dir=None):
    self.scope = scope
    # writes Tensorboard summaries to disk
    self.summary_writer = None
    with tf.variable_scope(scope):
      # build the graph
      self.build_model()
      if summaries_dir:
        summary_dir = os.path.join(summaries_dir, "summaries_{}".format(scope))
        # create directory if does not exist
        if not os.path.exists(summary_dir):
          os.makedirs(summary_dir)
        self.summary_writer = tf.summary.FileWriter(summary_dir)

  def build_model(self):
    """
    Builds the Tensorflow graph
    :return:
    """

    # placeholders for input
    # input of 4 RGB frames of shape 160, 160 each
    self.X_pl = tf.placeholder(shape=[None, 84, 84, 4], dtype=tf.uint8, name="X")
    # the TD target value
    self.y_pl = tf.placeholder(shape=[None], dtype=tf.float32, name="y")
    # integer id of which action was selected
    self.actions_pl = tf.placeholder(shape=[None], dtype=tf.int32, name="actions")

    X = tf.to_float(self.X_pl) / 255.0
    batch_size = tf.shape(self.X_pl)[0]

    # three convolutional layer
    conv1 = tf.contrib.layers.conv2d(X, 32, 8, 4, activation_fn=tf.nn.relu)
    conv2 = tf.contrib.layers.conv2d(conv1, 64, 4, 2, activation_fn=tf.nn.relu)
    conv3 = tf.contrib.layers.conv2d(conv2, 64, 3, 1, activation_fn=tf.nn.relu)

    # flattened layer
    flattened = tf.contrib.layers.flatten(conv3)

    # fully connected layers
    fc1 = tf.contrib.layers.fully_connected(flattened, 512)
    self.predictions = tf.contrib.layers.fully_connected(fc1, len(VALID_ACTIONS))

    # get the predictions for the choses actions only
    gather_indices = tf.range(batch_size) * tf.shape(self.predictions)[1] + self.actions_pl

    # slices from predictions according to indices
    self.action_predictions = tf.gather(tf.reshape(self.predictions, [-1]), gather_indices)

    # calculate the losses using the square difference (x-y)(x-y)
    self.losses = tf.squared_difference(self.y_pl, self.action_predictions)
    # reduce input losses, return tensor with single element
    self.loss = tf.reduce_mean(self.losses)

    ### CAN BE MODIFIED ###

    # optimizer parameters from original paper
    self.optimizer = tf.train.RMSPropOptimizer(0.00025, 0.99, 0.0, 1e-6)
    self.train_op = self.optimizer.minimize(self.loss, global_step=tf.contrib.framework.get_global_step())

    # summaries for Tensorboard
    self.summaries = tf.summary.merge([
      tf.summary.scalar("Loss", self.loss),
      tf.summary.scalar("Max_Q_Value", tf.reduce_max(self.predictions)),
      tf.summary.histogram("Loss_Hist", self.losses),
      tf.summary.histogram("Q_Value_Hist", self.predictions)])

  def predict(self, sess, s):
    """
    Predicts action values.

    Args:
      sess: Tensorflow session
      s: State input of shape [batch_size, 4, 160, 160, 3]

    Returns:
      Tensor of shape [batch_size, NUM_VALID_ACTIONS] containing the estimated
      action values.
    """
    return sess.run(self.predictions, {self.X_pl: s})

  def update(self, sess, s, a, y):
    """
    Preidct action values
    :param self:
    :param sess: tensorflow session object
    :param s: state input of shape [batch_size, 4, 160, 160, 3]
    :param a: chosen actions of shape [batch_size]
    :param y: targets of shape [batch_size]
    :return: the calculated loss on the batch
    """
    feed_dict = {self.X_pl: s, self.y_pl: y, self.actions_pl: a}
    summaries, global_step, _, loss = sess.run([self.summaries, tf.contrib.framework.get_global_step(), self.train_op, self.loss],
                                               feed_dict)
    if self.summary_writer:
      self.summary_writer.add_summary(summaries, global_step)
    return loss

class ModelParametersCopier():
  """
  Copy model parameters of one estimator to another
  """

  def __init__(self, estimator1, estimator2):
    """
    Defines copy-work operation graph
    :param estimator1: estimator to copy params from
    :param estimator2: estimator to copy params to
    """

    e1_params = [t for t in tf.trainable_variables() if t.name.startswith(estimator1.scope)]
    e1_params = sorted(e1_params, key=lambda v: v.name)

    e2_params = [t for t in tf.trainable_variables() if t.name.startswith(estimator2.scope)]
    e2_params = sorted(e2_params, key=lambda v: v.name)

    self.update_ops = []
    for e1_v, e2_v in zip(e1_params, e2_params):
      op = e2_v.assign(e1_v)
      self.update_ops.append(op)

  def make(self, sess):
    """
    Makes copy
    :param self:
    :param sess: Tensorflow session instance
    :return:
    """
    sess.run(self.update_ops)

def make_epsilon_greedy_policy(estimator, nA):
  """
  Creates an epsilon-greedy policy based on a given Q-function approximator and epsilon
  :param estimator: an estimator that returns q values for a given state
  :param nA: number of actions in the environment
  :return: A function that takes the (sess, observation, epsilon) as an argument
           probabilities for each action in the form of a numpy array of lenght nA
  """

  def policy_fn(sess, observation, epsilon):
    A = np.ones(nA, dtype=float) * epsilon /nA
    q_values = estimator.predict(sess, np.expand_dims(observation, 0))[0]
    best_action = np.argmax(q_values)
    A[best_action] += (1.0 - epsilon)
    return A
  return policy_fn

def deep_q_learning(sess,
                    env,
                    q_estimator,
                    target_estimator,
                    state_processor,
                    num_episodes,
                    experiment_dir,
                    replay_memory_size=500000,
                    replay_memory_init_size=50000,
                    update_target_estimator_every=10000,
                    discount_factor=0.99,
                    epsilon_start=1.0,
                    epsilon_end=0.1,
                    epsilon_decay_steps=500000,
                    batch_size=32,
                    record_video_every=50):

  Transition = namedtuple("Transition", ["state", "action", "reward", "next_state", "done"])

  # the replay memory
  replay_memory = []

  # make model copier object
  estimator_copy = ModelParametersCopier(q_estimator, target_estimator)

  # keeps track of useful statistics
  stats = plotting.EpisodeStats(
    episode_lengths = np.zeros(num_episodes),
    episode_rewards = np.zeros(num_episodes))

  # for 'system/' summaries, usefull to check if current process looks healthy
  current_process = psutil.Process()

  # create directories for checkpoints and summaries
  checkpoint_dir = os.path.join(experiment_dir, "checkpoints")
  checkpoint_path = os.path.join(checkpoint_dir, "model")
  monitor_path = os.path.join(experiment_dir, "monitor")

  if not os.path.exists(checkpoint_dir):
    os.makedirs(checkpoint_dir)
  if not os.path.exists(monitor_path):
    os.makedirs(monitor_path)

  saver = tf.train.Saver()
  # load previous checkpoint if we found one
  latest_checkpoint = tf.train.latest_checkpoint(checkpoint_dir)
  if latest_checkpoint:
    print('Loading model checkpoint {}...\n'.format(latest_checkpoint))
    saver.restore(sess, latest_checkpoint)

  # get the current time step
  total_t = sess.run(tf.contrib.framework.get_global_step())

  # the epsilon decay schedule
  epsilons = np.linspace(epsilon_start, epsilon_end, epsilon_decay_steps)

  # the policy we're following
  policy = make_epsilon_greedy_policy(q_estimator, len(VALID_ACTIONS))

  # populate the replay memory with initial experience
  print('Populating replay memory..')
  state = env.reset()
  state = state_processor.process(sess, state)
  state = np.stack([state] * 4, axis=2)
  for i in range(replay_memory_init_size):
    action_probs = policy(sess, state, epsilons[min(total_t, epsilon_decay_steps-1)])
    action = np.random.choice(np.arange(len(action_probs)), p=action_probs)
    next_state, reward, done, _ = env.step(VALID_ACTIONS[action])
    next_state = state_processor.process(sess, next_state)
    next_state = np.append(state[:,:,1:], np.expand_dims(next_state, 2), axis=2)
    replay_memory.append(Transition(state, action, reward, next_state, done))
    if done:
      state = env.reset()
      state = state_processor.process(sess, state)
      state = np.stack([state]*4, axis=2)
    else:
      state = next_state

  # record videos, add env monitor wrapper
  env = Monitor(env, directory=monitor_path, video_callable=lambda count: count % record_video_every == 0, resume=True)

  for i_episode in range(num_episodes):
    # save the current checkpoint
    saver.save(tf.get_default_session(), checkpoint_path)
    # reset the environment
    state = env.reset()
    state = state_processor.process(sess, state)
    state = np.stack([state] * 4, axis=2)
    loss = None

    # one step in the environment
    for t in itertools.count():

      # epsilon for this time step
      epsilon = epsilons[min(total_t, epsilon_decay_steps-1)]

      # maybe update the target estimator
      if total_t % update_target_estimator_every == 0:
        estimator_copy.make(sess)
        print('\nCopied model parameters to target network')

      # print out which step we're on, useful for debugging
      print('\rStep {} ({}) @ Episode {}/{}, loss: {}'.format(t, total_t, i_episode+1, num_episodes, loss), end="")
      sys.stdout.flush()

      # take a step
      action_probs = policy(sess, state, epsilon)
      action = np.random.choice(np.arange(len(action_probs)), p=action_probs)
      next_state, reward, done, _ = env.step(VALID_ACTIONS[action])
      next_state = state_processor.process(sess, next_state)
      next_state = np.append(state[:,:,1:], np.expand_dims(next_state, 2), axis=2)

      # if our replay memory is full, pop the first element
      if len(replay_memory) == replay_memory_size:
        replay_memory.pop(0)

      # save transition to replay memory
      replay_memory.append(Transition(state, action, reward, next_state, done))

      # update statistics
      stats.episode_rewards[i_episode] += reward
      stats.episode_lengths[i_episode] = t

      # sample a minibatch from the replay memory
      samples = random.sample(replay_memory, batch_size)
      states_batch, action_batch, reward_batch, next_states_batch, done_batch = map(np.array, zip(*samples))

      # calculate q values and targets
      q_values_next = target_estimator.predict(sess, next_states_batch)
      targets_batch = reward_batch + np.invert(done_batch).astype(np.float32) * discount_factor * np.amax(q_values_next, axis=1)

      # perform gradient descent update
      states_batch = np.array(states_batch)
      loss = q_estimator.update(sess, states_batch, action_batch, targets_batch)

      if done:
        break

      state = next_state
      total_t += 1

      # add summaries to tensorboard
      episode_summary = tf.Summary()
      episode_summary.value.add(simple_value=epsilon, tag="episode/epsilon")
      episode_summary.value.add(simple_value=stats.episode_rewards[i_episode], tag="episode/reward")
      episode_summary.value.add(simple_value=stats.episode_lengths[i_episode], tag="episode/length")
      episode_summary.value.add(simple_value=current_process.cpu_percent(), tag="system/cpu_usage_percent")
      episode_summary.value.add(simple_value=current_process.memory_percent(memtype="vms"),tag="system/v_memeory_usage_percent")
      q_estimator.summary_writer.add_summary(episode_summary, i_episode)
      q_estimator.summary_writer.flush()

      yield total_t, plotting.EpisodeStats(
        episode_lengths=stats.episode_lengths[:i_episode + 1],
        episode_rewards=stats.episode_rewards[:i_episode + 1])

    return stats

tf.reset_default_graph()

# where to save checkpoint and graphs
experiment_dir = os.path.abspath("./experiments/{}".format(env.spec.id))

# create a global step variable
global_step = tf.Variable(0, name='global_step', trainable=False)

# create estimators
q_estimator = Estimator(scope='q_estimator', summaries_dir=experiment_dir)
target_estimator = Estimator(scope='target_q')

# state processor
state_processor = StateProcessor()

# run

config = tf.ConfigProto()
config.gpu_options.allocator_type = 'BFC'
with tf.Session(config = config) as sess:
  sess.run(tf.global_variables_initializer())
  for t, stats in deep_q_learning(sess,
                                    env,
                                    q_estimator=q_estimator,
                                    target_estimator=target_estimator,
                                    state_processor=state_processor,
                                    experiment_dir=experiment_dir,
                                    num_episodes=10000,
                                    replay_memory_size=500000,
                                    replay_memory_init_size=50000,
                                    update_target_estimator_every=10000,
                                    epsilon_start=1.0,
                                    epsilon_end=0.1,
                                    epsilon_decay_steps=500000,
                                    discount_factor=0.99,
                                    batch_size=32):

    print("\nEpisode Reward: {}".format(stats.episode_rewards[-1]))

Thank you in advance for your help.

gsssrao commented 6 years ago

@lucadigiammarino This is because your GPU cannot allocate enough RAM for the network. You can try reducing the batch_size to a small number but even then if it doesn't work, you will need to change the layers parameters.

You can calculate the memory required for your network by following something like this.

Regarding abrupt stop, did you see a message like Killed? If yes, then you can check the log from syslog: vi /var/log/syslog

shobhitnpm commented 5 years ago

I had the Same issue

Please Can you please help me out with this?

gsssrao commented 5 years ago

@shobhitnpm Can you rather post the full-log instead of the screenshot along with the output of nvidia-smi? It should be either the case that you are already running some code on your GPU and it has eaten up your memory or your GPU i.e GTX 1050 doesn't have enough memory to run this network even with the batch size 1.

I think the second case is more probable because 1050 only has 3GB or 2GB memory, so you should probably try to use a smaller network like mobilenet, resnet50 etc instead of resnet101. Some guidelines on the hardware required for running different versions of faster-rcnn can be found here.

rishabhbhatt009 commented 5 years ago

I am trying to run a CNN however I am getting this error. I have tried reducing the batch size, the number of nodes, however, this still doesn't work ResourceExhaustedError: OOM when allocating tensor with shape[2458624,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node training_2/Adam/mul_23}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[{{node loss_1/mul}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

iseegr8tfuldeadppl commented 5 years ago

For the people stuck with this in models other than mnist. the reason for this is the high amount of parameters (please check your model.summary()).

A good method to drastically lower these parameters is to add: subsample=(2, 2) (careful it lowers the resolution of images/data) in all the Convolutional layers above that Flatten layer, if subsample doesn't work then it is stride=(2, 2).