qnl / qnl_nonmarkov_ml

Machine learning for non-Markovian trajectories
3 stars 3 forks source link

model.fit fails #1

Open noahstevenson opened 4 years ago

noahstevenson commented 4 years ago

Error running model.fit; among the many warnings, the relevant message is the last:

TypeError: Expected int64 passed to parameter 'y' of op 'NotEqual', got -1.0 of type 'float' instead. Error: Expected int64, got -1.0 of type 'float' instead.

This may have to do with different package versions, so (@gkoolstra) a list of your environment package versions would be good to have in the vanilla_lstm directory.

Running

python train.py

Results in

/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216, got 192
  return f(*args, **kwds)
2020-07-03 20:10:24.563005: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:10:24.563137: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:10:24.563160: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Loading data...
Loaded data...
Training batch size: 30099 (90.0%)
Validation batch size: 3345 (10.0%)
Saving processed data to /home/qnl/noah/projects/2020-NonMarkovTrajectories/local-data/2020_06_29/cr_trajectories_test_021/phase_0/prep_C+X_T+X...
(qutip-env) [qnl@kraken vanilla_lstm]$ python train.py
/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216, got 192
  return f(*args, **kwds)
2020-07-03 20:11:36.221940: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:11:36.222201: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:11:36.222228: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-07-03 20:11:37.890903: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-03 20:11:37.917089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-07-03 20:11:37.917299: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:11:37.917419: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:11:37.917532: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:11:37.917642: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:11:37.917750: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:11:37.917856: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:11:37.974772: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-03 20:11:37.974854: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
[]
True
Traceback (most recent call last):
  File "train.py", line 40, in <module>
    with h5py.File(os.path.join(filepath, 'training_validation_split.h5'), "r") as f:
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/h5py/_hl/files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'data/cts_rabi_amp_6/prep_Y/training_validation_split.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
(qutip-env) [qnl@kraken vanilla_lstm]$ python train.py
/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216, got 192
  return f(*args, **kwds)
2020-07-03 20:42:00.885936: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:00.886068: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:00.886091: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-07-03 20:42:02.536629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-03 20:42:02.565691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-07-03 20:42:02.565886: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:02.566004: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:02.566115: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:02.566237: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:02.566344: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:02.566452: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:02.570485: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-03 20:42:02.570522: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
[]
True
Creating model...
/home/qnl/noah/projects/2020-NonMarkovTrajectories/code/qnl_nonmarkov_ml/vanilla_lstm/vanilla_lstm.py:576: RuntimeWarning: invalid value encountered in true_divide
  colors = plt.cm.viridis(np.arange(len(labels)) / (len(labels) - 1))
/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/matplotlib/colors.py:504: RuntimeWarning: invalid value encountered in less
  xa[xa < 0] = -1
2020-07-03 20:42:04.244235: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-03 20:42:04.258379: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100040000 Hz
2020-07-03 20:42:04.260040: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5b72740 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-03 20:42:04.260075: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-03 20:42:04.362460: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5283810 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-03 20:42:04.362497: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-07-03 20:42:04.362611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-03 20:42:04.362625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      
Building model...
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
masking (Masking)            (None, 250, 2)            0         
_________________________________________________________________
lstm (LSTM)                  (None, 250, 32)           4480      
_________________________________________________________________
time_distributed (TimeDistri (None, 250, 6)            198       
=================================================================
Total params: 4,678
Trainable params: 4,678
Non-trainable params: 0
_________________________________________________________________
Compiling model...
Expected accuracy should converge to 0.6361607142857143
Training started...
Train on 30099 samples, validate on 3345 samples
Setting up a new session...
Epoch 1/20
2020-07-03 20:42:08.388479: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
2020-07-03 20:42:08.388550: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1259] Profiler found 1 GPUs
2020-07-03 20:42:08.388779: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.1'; dlerror: libcupti.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64:
2020-07-03 20:42:08.388821: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
2020-07-03 20:42:08.388842: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI could not be loaded or symbol could not be found.
 1024/30099 [>.............................] - ETA: 26sDropout scheduling failed.
2020-07-03 20:42:08.415242: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI could not be loaded or symbol could not be found.
2020-07-03 20:42:08.415288: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:88]  GpuTracer has collected 0 callback api events and 0 activity events.
Traceback (most recent call last):
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 324, in _AssertCompatible
    fn(values)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 263, in inner
    _ = [_check_failed(v) for v in nest.flatten(values)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 264, in <listcomp>
    if not isinstance(v, expected_types)]
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 248, in _check_failed
    raise ValueError(v)
ValueError: -1.0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
    allow_broadcast=True)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 296, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 451, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 331, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int64, got -1.0 of type 'float' instead.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 107, in <module>
    history = m.fit_model(total_epochs)
  File "/home/qnl/noah/projects/2020-NonMarkovTrajectories/code/qnl_nonmarkov_ml/vanilla_lstm/vanilla_lstm.py", line 159, in fit_model
    DropOutScheduler(self.dropout_schedule)])
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit
    total_epochs=epochs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function
    distributed_function(input_fn))
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 615, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 497, in _initialize
    *args, **kwds))
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2389, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2703, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2593, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 439, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 85, in distributed_function
    per_replica_function, args=args)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 763, in experimental_run_v2
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1819, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 2164, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 292, in wrapper
    return func(*args, **kwargs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 433, in train_on_batch
    output_loss_metrics=model._output_loss_metrics)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 312, in train_on_batch
    output_loss_metrics=output_loss_metrics))
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 253, in _process_single_batch
    training=training))
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 167, in _model_loss
    per_sample_losses = loss_fn.call(targets[i], outs[i])
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/losses.py", line 221, in call
    return self.fn(y_true, y_pred, **self._fn_kwargs)
  File "/home/qnl/noah/projects/2020-NonMarkovTrajectories/code/qnl_nonmarkov_ml/vanilla_lstm/vanilla_lstm.py", line 188, in masked_loss_function
    mask = K.cast(K.not_equal(y_true, self.mask_value), K.floatx())
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 2331, in not_equal
    return math_ops.not_equal(x, y)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/ops/math_ops.py", line 1340, in not_equal
    return gen_math_ops.not_equal(x, y, name=name)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6455, in not_equal
    name=name)
  File "/home/qnl/miniconda3/envs/qutip-env/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 477, in _apply_op_helper
    repr(values), type(values).__name__, err))
TypeError: Expected int64 passed to parameter 'y' of op 'NotEqual', got -1.0 of type 'float' instead. Error: Expected int64, got -1.0 of type 'float' instead.
gkoolstra commented 4 years ago

Could not reproduce error on Wobbuffet so I have added the "environment.yml" file in the vanilla_lstm directory. Can you make sure you have the same version of tensorflow and keras?