ybarancan / BEV_feat_stitch

Official code for Understanding Bird’s-Eye View of Road Semantics using an Onboard Camera - RAL/ICRA 2022
Apache License 2.0
73 stars 13 forks source link

Test Results #6

Open ShangWeize opened 2 years ago

ShangWeize commented 2 years ago

Hi, I used the parametric model you provided: bev-stitch-nusc. When running test.py, the output is: temp_string = "Iteration : " + str(iteration) + " : Scene " + str(my_scene_token)+ " - j1: " + str(np.mean(temp_res,axis=0)) Here j1 represents the accuracy of 4 kinds of static. I am confused which result of j1 corresponds to which result in the paper?

ybarancan commented 2 years ago

That line is just logging the results for the current scene in the loop. So it does not correspond to any results. The results are printed after the loop is done. So in L789 in nuscenes_test.py. Since the static classes are not one-hot (a pixel can be a crosswalk and a drivable area) we report them separately. For the objects, it is one-hot so we also report confusion matrix.

ShangWeize commented 2 years ago

Thank you so much for clearing up my confusion!

ShangWeize commented 2 years ago

Hello, in the test, I have the following three questions: 1. I found that the following two lines of code assign values to bev_total_relative_endpoints. What is the difference between them? bev_total_relative_endpoints = [combined_end] bev_total_relative_endpoints = [tf.concat([combined_end, bigger_resized_combined_projected_estimates], axis=-1)]

**2. I don't quite understand the difference between total_input and bev_total_relative_endpoints in mem_net.my_bev_object_decoder. Moreover, I don't quite understand what role endpoints play in all network structures. I checked a lot of information on this question and found no relevant answers.

  1. When doing the ablation experiment, I used the 39999 parameter provided by you. I tried to assign all the output data of the image level batch to 0, assuming that the image was removed, but the test results did not change significantly. May I ask whether the output of the image can be regarded as invalid by the way I assign it to 0?**
ybarancan commented 2 years ago

1) The version we used in the paper is bev_total_relative_endpoints = [combined_end]. Which is the version used in nuscenes_test.py and is compatible with the provided checkpoint. The other one bev_total_relative_endpoints = [tf.concat([combined_end, bigger_resized_combined_projected_estimates], axis=-1)] was an experimental version. You can train with it and change the test.py accordingly or use the version in nuscenes_test.py to reproduce the results. 2) Endpoints refer to the intermediate representations of the encoder (backbone) that are used in decoder to provide low level but high resolution information to the decoder. I recommend original U-Net paper. 3) If you remove the image, what is the method going to use as input? Setting batch size to 0 might have been interpreted by tensorflow as any batch size.