tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected size[0] in [0, 0], but got 1

GXU-doudou commented 3 years ago

I built a neural network and trained successfully on other data sets, but I encountered the following problems when I wanted to use your data set for training. Step 00000050 L_out= nan Acc= nan --- 477.08 ms/batch Step 00000100 L_out= nan Acc= nan --- 865.59 ms/batch Caught a NaN error : 3 Expected size[0] in [0, 0], but got 1 [[{{node loss/softmax_cross_entropy_with_logits/Slice}} = Slice[Index=DT_INT32, T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](loss/softmax_cross_entropy_with_logits/Shape, loss/softmax_cross_entropy_with_logits/Slice_2/size, optimizer/gradients/GatherV2_2_grad/strided_slice/stack)]] name: "loss/softmax_cross_entropy_with_logits/Slice" op: "Slice" input: "loss/softmax_cross_entropy_with_logits/Shape_1" input: "loss/softmax_cross_entropy_with_logits/Slice/begin" input: "loss/softmax_cross_entropy_with_logits/Slice/size" attr { key: "Index" value { type: DT_INT32 } } attr { key: "T" value { type: DT_INT32 } }

loss/softmax_cross_entropy_with_logits/Slice ['loss/softmax_cross_entropy_with_logits/Shape_1:0', 'loss/softmax_cross_entropy_with_logits/Slice/begin:0', 'loss/softmax_cross_entropy_with_logits/Slice/size:0'] ['loss/softmax_cross_entropy_with_logits/Slice:0'] Traceback (most recent call last): File "/home/doudou/anaconda3/envs/randlanet/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call return fn(*args) File "/home/doudou/anaconda3/envs/randlanet/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/doudou/anaconda3/envs/randlanet/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected size[0] in [0, 0], but got 1 [[{{node loss/softmax_cross_entropy_with_logits/Slice}} = Slice[Index=DT_INT32, T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](loss/softmax_cross_entropy_with_logits/Shape, loss/softmax_cross_entropy_with_logits/Slice_2/size, optimizer/gradients/GatherV2_2_grad/strided_slice/stack)]]

maskjp commented 3 years ago

Hi, @GXU-doudou,

Thank you for your interest in RELLIS-3D. I am not very familiar with TensorFlow, but it looks like that it might be because the data load part has a problem. Are you using the image labels? Can you check the size of your loaded data and give me more information?

Thanks!

GXU-doudou commented 3 years ago

Thanks for your reply. I'm using the 'LiDAR Annotation SemanticKITTI Format'.My network parameters are as follows: '''' class ConfigSemanticKITTI: k_n = 16 # KNN num_layers = 4 # Number of layers num_points =4096*11 # Number of input points

num_classes = 19  # Number of valid classes
sub_grid_size = 0.06 # preprocess_parameter

batch_size = 3 # batch_size during training
val_batch_size = 6  # batch_size during validation and test
train_steps = 500  # Number of steps per epochs
val_steps = 100  # Number of validation steps per epoch

sub_sampling_ratio = [4, 4, 4, 4]  # sampling ratio of random sampling at each layer
d_out = [16, 64, 128, 256]  # feature dimension
num_sub_points = [num_points // 4, num_points // 16, num_points // 64, num_points // 256]

noise_init = 3.5  # noise initial parameter
max_epoch = 100  # maximum epoch during training
learning_rate = 0.01 # initial learning rate
lr_decays = {i: 0.95 for i in range(0, 500)}  # decay rate of learning rate

train_sum_dir = 'train_log'
saving = True
saving_path = None

''' I used the softmax function in the network. I think it is possible that there is a 'Nan' problem when calculating the loss function, but I can't solve it. There is another situation as you said, there is a problem with the data input. I want to know if there is a difference between LiDAR Annotation SemanticKITTI Format and the standard SemanticKITTI format. If you have time, I hope you can understand RandLA-net, our network is improved on the basis of it, so that you will have a better understanding of this issue. Thanks~

maskjp commented 3 years ago

Hi, @GXU-doudou,

The format is the same as semanticKITTI. We usually only use 14 classes for the point cloud. Please take a look at it here. And you can find how we read the point cloud and label here.

Another possibility I can think that there are some points in our point cloud that have coordinates (0,0,0) which not happened in KITTI. So if you dive the coordinate or the depth(0). You might have a Nan problem.

I hope this information can help.

Best wishes!

GXU-doudou commented 3 years ago

Thank you for your suggestions, I probably understand where the problem is~ Your data set really helped me a lot.

unmannedlab / RELLIS-3D

tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected size[0] in [0, 0], but got 1 #14