lululxvi / deepxde

A library for scientific machine learning and physics-informed learning
https://deepxde.readthedocs.io
GNU Lesser General Public License v2.1
2.69k stars 751 forks source link

Type mismatch when trying to use L-BFGS #1558

Open jdellag opened 11 months ago

jdellag commented 11 months ago

I've been working on a Navier Stokes problem and would like to try to further optimize my model after training with Adam like many of the examples I have seen. However, when trying to use this optimizer after setting the default precision to float64, I am receiving the following error:

Compiling model...

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[18], line 18
     15 for beta in beta_values:
     16     print(f"INTIALIZING RUN ******* {frame} ** Beta = {beta} ** Re = {Re} *******")
---> 18     losshistory, train_state, model = train_for_Re(Re, beta)
     19     # Update the progress count
     20     progress_count += 1

Cell In[17], line 38, in train_for_Re(Re, beta)
     32 variable = dde.callbacks.VariableValue(U, period = 500, filename = fnamevar, precision = 32)
     34 #losshistory, train_state = model.train(iterations = num_iter, callbacks = [variable], display_every = 500)
     35 #dde.utils.external.saveplot(losshistory, train_state, 
     36                             #issave = True, isplot = True, train_fname = train_filename, 
     37                             #    test_fname = test_filename, loss_fname = loss_filename, output_dir = "run_data")
---> 38 model.compile('L-BFGS-B')
     39 losshistory, train_state = model.train(iterations = num_iter, callbacks = [variable])
     41 return losshistory, train_state, model

File ~/python_environments/deepxde2/lib/python3.11/site-packages/deepxde/utils/internal.py:22, in timing.<locals>.wrapper(*args, **kwargs)
     19 @wraps(f)
     20 def wrapper(*args, **kwargs):
     21     ts = timeit.default_timer()
---> 22     result = f(*args, **kwargs)
     23     te = timeit.default_timer()
     24     if config.rank == 0:

File ~/python_environments/deepxde2/lib/python3.11/site-packages/deepxde/model.py:137, in Model.compile(self, optimizer, lr, loss, metrics, decay, loss_weights, external_trainable_variables)
    134     self.external_trainable_variables = external_trainable_variables
    136 if backend_name == "tensorflow.compat.v1":
--> 137     self._compile_tensorflow_compat_v1(lr, loss_fn, decay)
    138 elif backend_name == "tensorflow":
    139     self._compile_tensorflow(lr, loss_fn, decay)

File ~/python_environments/deepxde2/lib/python3.11/site-packages/deepxde/model.py:194, in Model._compile_tensorflow_compat_v1(self, lr, loss_fn, decay)
    192 self.outputs_losses_train = [self.net.outputs, losses_train]
    193 self.outputs_losses_test = [self.net.outputs, losses_test]
--> 194 self.train_step = optimizers.get(
    195     total_loss, self.opt_name, learning_rate=lr, decay=decay
    196 )

File ~/python_environments/deepxde2/lib/python3.11/site-packages/deepxde/optimizers/tensorflow_compat_v1/optimizers.py:22, in get(loss, optimizer, learning_rate, decay)
     20     if learning_rate is not None or decay is not None:
     21         print("Warning: learning rate is ignored for {}".format(optimizer))
---> 22     return ScipyOptimizerInterface(
     23         loss,
     24         method="L-BFGS-B",
     25         options={
     26             "maxcor": LBFGS_options["maxcor"],
     27             "ftol": LBFGS_options["ftol"],
     28             "gtol": LBFGS_options["gtol"],
     29             "maxfun": LBFGS_options["maxfun"],
     30             "maxiter": LBFGS_options["maxiter"],
     31             "maxls": LBFGS_options["maxls"],
     32         },
     33     )
     35 if isinstance(optimizer, tf.train.AdamOptimizer):
     36     optim = optimizer

File ~/python_environments/deepxde2/lib/python3.11/site-packages/deepxde/optimizers/tensorflow_compat_v1/scipy_optimizer.py:102, in ExternalOptimizerInterface.__init__(self, loss, var_list, equalities, inequalities, var_to_bounds, **optimizer_kwargs)
     95 inequalities_grads = [
     96     _compute_gradients(inequality, self._vars)
     97     for inequality in self._inequalities
     98 ]
    100 self.optimizer_kwargs = optimizer_kwargs
--> 102 self._packed_var = self._pack(self._vars)
    103 self._packed_loss_grad = self._pack(loss_grads)
    104 self._packed_equality_grads = [
    105     self._pack(equality_grads) for equality_grads in equalities_grads
    106 ]

File ~/python_environments/deepxde2/lib/python3.11/site-packages/deepxde/optimizers/tensorflow_compat_v1/scipy_optimizer.py:251, in ExternalOptimizerInterface._pack(cls, tensors)
    249 else:
    250     flattened = [tf.reshape(tensor, [-1]) for tensor in tensors]
--> 251     return tf.concat(flattened, 0)

File ~/python_environments/deepxde2/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    151 except Exception as e:
    152   filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153   raise e.with_traceback(filtered_tb) from None
    154 finally:
    155   del filtered_tb

File ~/python_environments/deepxde2/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py:500, in _ExtractInputsAndAttrs(op_type_name, op_def, allowed_list_attr_map, keywords, default_type_attr_map, attrs, inputs, input_types)
    497     raise TypeError(f"{prefix} that do not match type {dtype.name} "
    498                     "inferred from earlier arguments.")
    499   else:
--> 500     raise TypeError(f"{prefix} that don't all match.")
    501 else:
    502   raise TypeError(f"{prefix} that are invalid. Tensors: {values}")

TypeError: Tensors in list passed to 'values' of 'ConcatV2' Op have types [float32, float64, float64, float64, float64, float64, float64] that don't all match.

I am puzzled as to why this is happening. FWIW, I have run into this error on both the DeepXDE docker image and while trying to run on a M1 Mac as well (both with tensorflow.compat.v1 backends). What surprises me even more is that I have not seen this specific type of error on this GitHub or anywhere else on the internet for that matter. Any ideas on what this could be?

vl-dud commented 11 months ago

However, when trying to use this optimizer after setting the default precision to float64

Did you call deepxde.config.real.set_float64() right after import deepxde? If not, do it.

jdellag commented 11 months ago

Yes, trying that and any combination of the following three;

deepxde.config.set_default_float("float64")
tf.keras.backend.set_floatx("float64")
deepxde.config.real.set_float64()

has not yielded anything other than the error posted above.

vl-dud commented 11 months ago

Can you show all the code?

jdellag commented 11 months ago

I finally nailed down what it was, and I'm surprised I didn't catch it earlier. I have a trainable variable U that I am interested in, but I noticed that when writing the value of U at every 500 iterations that it was a float32 even though I explicitly set all reals to be float64. Specifying the dtype of U when I initialized it as U = dde.variable(1.0, dtype='float64') did the trick.

Can we get this change commited in the code base?

lululxvi commented 11 months ago

Good point. @jdellag Would like to submit a PR to fix it?

You only need to modify here https://github.com/lululxvi/deepxde/blob/5b21146dd2c73e8df7d31acaad8b5604d30fc3e8/deepxde/backend/tensorflow_compat_v1/tensor.py#L83