Issue with Prediction after Training

omarhassan91 commented 3 years ago

Hi,

I modified the point_pillars_training.py file to save the model as pb as follows: pillar_net.save('point_pillars1.pb') I also changed the parameters in config.py to a batch size of 2 and to run 20 epochs for training. After I run the training on an RTX 2080, I tried running the point_pillars_prediction.py file to get the following output:

File "/PointPillars/processors.py", line 155, in getitem lidar = self.data_reader.read_lidar(self.lidar_files[i]) IndexError: list index out of range

Here is what I have installed: Tensorflow: 2.1.0 Keras: 2.4.3 OpenCV-Python: 4.2.0.34 CUDA: 10.1

I have the save_model.pb saved in the root directory, while the model.h5 is saved under '/logs'. Anyone know what the issue is?

omarhassan91 commented 3 years ago

Here is the full traceback:

Traceback (most recent call last): File "point_pillars_prediction.py", line 33, in occupancy, position, size, angle, heading, classification = pillar_net.predict(eval_gen, File "/home/kingkoopa/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 130, in _method_wrapper return method(self, *args, **kwargs) File "/home/kingkoopa/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1569, in predict data_handler = data_adapter.DataHandler( File "/home/kingkoopa/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 1105, in init self._adapter = adapter_cls( File "/home/kingkoopa/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 909, in init super(KerasSequenceAdapter, self).init( File "/home/kingkoopa/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 786, in init peek, x = self._peek_and_restore(x) File "/home/kingkoopa/miniconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py", line 920, in _peek_and_restore return x[0], x File "/home/kingkoopa/intel/PointPillars_test/test/PointPillars/processors.py", line 155, in getitem lidar = self.data_reader.read_lidar(self.lidar_files[i]) IndexError: list index out of range

tyagi-iiitv commented 3 years ago

This error might occur in case the number of files in the folder (label_2, lidar and calib) don't match. Can you please make sure that the number of files are the same in these three folders and also there are at least some number of files > 0?

omarhassan91 commented 3 years ago

I see, the issue was that I wasn't pointing to the correct data files in my 'training' folder...kind of a stupid mistake to make on my end. I have since fixed this and now I'm seeing the following issue:

2020-10-02 14:04:31.746870: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 Loop Range: 1 Scene 1: Box predictions with occupancy > occ_thr: 0 Traceback (most recent call last): File "point_pillars_prediction3.py", line 44, in nms_boxes = rotational_nms(set_boxes, confidences, occ_threshold=0.7, nms_iou_thr=0.5) File "/home/kingkoopa/intel/PointPillars_test/test/PointPillars/inference_utils.py", line 41, in rotational_nms assert (isinstance(set_boxes[0][0][0][0], float) or isinstance(set_boxes[0][0][0][0], int)) and \ IndexError: list index out of range

Also, the 'Loop Range' string was part of the debugging I was doing. What I've come to the conclusion from a high level is that the set_boxes array is not being populated in the point_pillars_prediction.py script. I'm using a model trained using a batch size of 1, and the same reshape dimensions (1x252x252x4). I can also provide the model.h5 file as well, but un-commenting pillar_net.summary() prints all the layers in the model.

In the point_pillars_prediction.py script, the line set_boxes.append(generate_bboxes_from_pred(occupancy[i], position[i], size[i], angle[i], heading[i], classification[i], params.anchor_dims, occ_threshold=0.7)) is returning [ ], which would explain why I'm getting this error.

Any ideas?

tyagi-iiitv commented 3 years ago

This is because your set_boxes is empty. I'll add an additional check for that in the code, but for now, you can try reducing the occ_threshold value to get at least some bounding boxes, I'm using 0.3 for my dataset. You can see what value works best for you. Try starting low (say 0.1) and then increase depending on the number of bounding boxes the network is detecting.

omarhassan91 commented 3 years ago

Ahh I see, that seemed to have worked! So what is this occ_threshold value in relation to the set_boxes variable? I see in the inference_utils.py there is: assert len(set_boxes) == len(confidences) and 0 < occ_threshold < 1 and 0 < nms_iou_thr < 1 but I'm not sure how that links the two...

Thanks!

tyagi-iiitv commented 3 years ago

If you refer to this post, you'll see that the occupancy is basically a mask b/w foreground and background objects. The occ_threshold is the value we use to distinguish b/w foreground and background for our data. A 0.3 threshold means that the values in the occupancy matrix which are > 0.3 are treated as foreground objects, i.e. the bounding boxes in our case and the rest are ignored. Hence changing this threshold returns a different number of bounding boxes.

tyagi-iiitv / PointPillars

Issue with Prediction after Training #17