GPU-Error from apply_feature_transform: Check failed: work_element_count > 0

PhilippBrendel commented 1 year ago

Hi,

detected a potential Issue that I have not found here so far and general TF-threads on this have not helped either.

I'm trying to use apply_feature_transform to transform my two inputs (x,y) to a fourier feature space (not random like in the MSFNN implementation but with certain harmonic sinusoidal functions), when I get this right at the start of training:

Step Train loss Test loss Test metric 0 [2.30e+02, 0.00e+00, 0.00e+00, 0.00e+00] [5.02e+02, 0.00e+00, 0.00e+00, 0.00e+00] [] 2022-09-28 17:56:15.139713: F ./tensorflow/core/util/gpu_launch_config.h:129] Check failed: work_element_count > 0 (0 vs. 0)

Occurs only when run on GPUs and only in cases where I return something that includes multiplication of my two inputs (like sin(x) * sin(y)) in some senses, e.g minimal example:

def feature_transform(x):
    return x[:, 0:1] * x[:, 1:2]

net.apply_feature_transform(feature_transform)

I'm using a TF2.4 environment with tf.compat.v1 set as deepxde-backend on a Tesla V100 GPU with CUDA11.4 I tried updating deepxde to 1.6.2 as well different floating point precisions (via deepxde as well as explicit casting to tf.float32 or tf.float64) without any success.

I also found that it happens for ResNet and FNN models, but not for the MSFNN implementation.

Any Idea what's going on here?

lululxvi commented 1 year ago

So, it works in CPU, but not GPU?

Could you try newer versions of TF?

PhilippBrendel commented 1 year ago

Yes, on CPU it worked fine with same code and TF2.4 as well.

TF2.5 and TF2.6 did not work either but TF2.7 seems to have fixed it.

Thanks!

lululxvi commented 1 year ago

Sounds good. Then I will require TensorFlow>=2.7.0 for tf.compat.v1 backend

lululxvi / deepxde

GPU-Error from apply_feature_transform: Check failed: work_element_count > 0 #937