mAP on evaluation server is all 0

SM1991CODES commented 3 years ago

Hi,

I generated predictions on a subset of the validation set and uploaded the file (after doing create_submission) to the eval server. I get all 0s. What could be wrong?

Does the eval server automatically take the corresponding subset of gt frames?

I tried offline eval using the downloaded gt from google storage, it also gives all 0s.

Please help me out.

Best Regards Sambit

peisun1115 commented 3 years ago

It is 0 usually because these fields are not set properly. We require them to be matched exactly.

SM1991CODES commented 3 years ago

Thank you for the quick answer. But I have set these fields. Here is what I do: for every tfrecord, take every 10th frame (1s apart). Save the context and timestamps, predict. Then I create a single write .bin object for all detected objects across all frames.

But your other comment about gt.bin means that since I am predicting only on about 500 frames, all others are FN and hence the score is nearly 0. Could that be the case?

Please confirm.

Best Regards Sambit

peisun1115 commented 3 years ago

i think it shouldn't be exactly 0 if you believe you are predicting anything meaningful for those 1/10 frames. Can you try the following:

Find a frame (identified by context_name, timestamp). Read its ground truth directly from python.
Open the bin file you produced. Print the predictions. Make sure "context_name, timestamp" match. And make sure there is any predicted box with IoU > 0.7.

If you still see AP = 0 after these 2 steps are validated. Then can you send me the "context_name, timestamp" you have checked, and the ground truth boxes, your predictions for that frame.

SM1991CODES commented 3 years ago

Hi,

Thank you for that advice. Just one important point - I am predicting 3D objects only using the top lidar. So I am also writing labels for (x, y, z, l, w, h, heading) for that. Hope that is not a problem, is it? As in, do I need to convert them to camera coordinates?

I just read something about gt in camera frame and wanted to make sure.

Best Regards Sambit

On Mon, 21 Dec 2020 at 17:37, Pei Sun notifications@github.com wrote:

i think it shouldn't be exactly 0 if you believe you are predicting anything meaningful for those 1/10 frames. Can you try the following:

Find a frame (identified by context_name, timestamp). Read its ground truth directly from python.

Open the bin file you produced. Print the predictions. Make sure "context_name, timestamp" match. And make sure there is any predicted box with IoU > 0.7.

If you still see AP = 0 after these 2 steps are validated. Then can you send me the "context_name, timestamp" you have checked, and the ground truth boxes, your predictions for that frame.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waymo-research/waymo-open-dataset/issues/238#issuecomment-749066337, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQRX7IEXDEHDV2WZXBEWCULSV52URANCNFSM4VCIEKLA .

peisun1115 commented 3 years ago

no. that shouldn't be a problem. You don't need to convert to camera frame. All predictions should be in vehicle frame which is what we use for ground truths.

SM1991CODES commented 3 years ago

Okay, thank you. I am just struggling a lot with several topics. I am trying to detect objects using range images.

Some of the topics I am struggling with are: 1) 8GB GPU memory on my workstation laptop is not enough to process even (2, 1, 64, 2650) range images produced by waymo code. 2) Could not yet figure out how to download using gsutil - GCP compute engine gives authentication error, no access for bucket.objects.list, I tried several times and referred several posts but no result. 3) I tried splitting up each RI into 10 sections of 64, 256, thereby ignoring the last 90 columns. This is the one which I was able to train and predict on the validation set. The visual results served as a proof of concept but then eval problem. 4) Now I think splitting up range images into several sections is not really a good idea for the network. Also, I plan to use the intensity channel.

Summarily, I just wanted to inform you of all the problems I face right now. I totally understand you cannot help me with everything.

Best Regards Sambit

On Tue, 22 Dec 2020 at 00:21, Pei Sun notifications@github.com wrote:

no. that shouldn't be a problem. You don't need to convert to camera frame. All predictions should be in vehicle frame which is what we use for ground truths.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waymo-research/waymo-open-dataset/issues/238#issuecomment-749250541, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQRX7IF5PJVAA6DGDJOVLOTSV7KAVANCNFSM4VCIEKLA .

waymo-research / waymo-open-dataset

mAP on evaluation server is all 0 #238