qnl / qnl_nonmarkov_ml

Machine learning for non-Markovian trajectories
3 stars 3 forks source link

model.fit fails with tflow environment #2

Open noahstevenson opened 4 years ago

noahstevenson commented 4 years ago

model.fit fails with tflow environment (as defined in environment,yml). Fails in the same place as #1 but with different error message: ValueError: Shapes (None, 250, 6) and (None, 240, 6) are incompatible (full traceback below)

running

python train.py

results in

2020-07-05 18:34:43.056278: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.
2020-07-05 18:34:43.056351: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1363] Profiler found 1 GPUs
2020-07-05 18:34:43.056832: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.1'; dlerror: libcupti.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-05 18:34:43.056873: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1408] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
2020-07-05 18:34:43.056900: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1447] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI could not be loaded or symbol could not be found.
2020-07-05 18:34:43.056936: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1430] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI could not be loaded or symbol could not be found.
Setting up a new session...
Epoch 1/20
Traceback (most recent call last):
  File "train.py", line 107, in <module>
    history = m.fit_model(total_epochs)
  File "/home/qnl/noah/projects/2020-NonMarkovTrajectories/code/qnl_nonmarkov_ml/vanilla_lstm/vanilla_lstm.py", line 151, in fit_model
    history = self.model.fit(self.training_features, self.training_labels, epochs=epochs,
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 848, in fit
    tmp_logs = train_function(iterator)
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 627, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 505, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2657, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:532 train_step  **
        loss = self.compiled_loss(
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:205 __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/keras/losses.py:143 __call__
        losses = self.call(y_true, y_pred)
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/keras/losses.py:246 call
        return self.fn(y_true, y_pred, **self._fn_kwargs)
    /home/qnl/noah/projects/2020-NonMarkovTrajectories/code/qnl_nonmarkov_ml/vanilla_lstm/vanilla_lstm.py:191 masked_loss_function
        pred_logits = K.reshape(tf.boolean_mask(y_pred, mask), (batch_size, 2))
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:180 wrapper
        return target(*args, **kwargs)
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:1746 boolean_mask_v2
        return boolean_mask(tensor, mask, name, axis)
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:1678 boolean_mask
        shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask)
    /home/qnl/miniconda3/envs/vanilla_lstm_tflow/lib/python3.8/site-packages/tensorflow/python/framework/tensor_shape.py:1117 assert_is_compatible_with
        raise ValueError("Shapes %s and %s are incompatible" % (self, other))

    ValueError: Shapes (None, 250, 6) and (None, 240, 6) are incompatible
gkoolstra commented 4 years ago

It seems like y_true and y_pred have different shapes. Can you verify that your training_features and validation features have the same shape (number_of_seqs, sequence_length, 2) and that training labels and validation labels have the same shape (number_of_seqs, sequence_length, 6) before calling MultiTimeStep()