zubair-irshad / CenterSnap

Pytorch code for ICRA'22 paper: "Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation"
https://zubair-irshad.github.io/projects/CenterSnap.html
286 stars 47 forks source link

How to generate evaluation result #10

Closed sky23249 closed 2 years ago

sky23249 commented 2 years ago

Thank you for the great work! I can see function compute_mAP in nocs_eval_utils.py, but it is not used in code. Should I call this function to generate evaluation result? If so, how can I get the parameter pred_results? Looking forward to your reply! Thank you!

https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/utils/nocs_eval_utils.py#L564

zubair-irshad commented 2 years ago

Thanks @sky23249 for your interest in our work. To make our evaluation script standard compared to all the baselines reported in the paper, we use the same evaluation script as used in object-deformnet and NOCS.

We currently only provide the inference script. Certainly, you can call the compute_mAP function to get the evaluation results. For the pred_results parameters, you can pass it a list of GT class ids, poses, sizes and predicted class_ids, poses, sizes and scores as follows:

    result = {}
    result['gt_class_ids'] = gts['class_ids']
    result['gt_bboxes'] = gts['bboxes']
    result['gt_RTs'] = gts['poses']
    result['gt_scales'] = gts['size']
    result['gt_handle_visibility'] = gts['handle_visibility']

    result['pred_class_ids'] = class_ids
    result['pred_scores'] = scores
    result['pred_RTs'] = f_sRT
    result['pred_scales'] = f_size

    pred_results.append(result)

The GT label file is the same as provided by NOCS. Please see this script (provided by object-deformnet) to download a label file for each image. You can load the gts label file as follows:

    with open(img_full_path + '_label.pkl', 'rb') as f:
        gts = cPickle.load(f)

For further clarification, please also see this evaluation script provided by object-deformnet where the compute_map is called here.

Hope it helps!

sky23249 commented 2 years ago

Thanks for your help!

I successfully generate the pred_result and call the compute_mAP in nocs_eval_utils.py, but got very low value of mAP. There are mainly two questions:

  1. The meaning of keys in result

    First I generate ground truth result by run detect_eval (provided by NOCS). Then I generate the pred_result by modifying the inference script, I modify this line to get segmentation output and save pred_cls_ids by selecting indices from pose_output: https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/inference/inference_real.py#L61 https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/inference/inference_real.py#L62 modified code:

    seg_output, depth_output, small_depth_output, pose_output = model.forward(input)
    latent_emb_outputs, abs_pose_outputs, img_output, scores, indices = pose_output.compute_pointclouds_and_poses(
                        min_confidence, is_target=False)
    
    seg_pred = seg_output.get_prediction()
    seg_pred = np.argmax(seg_pred, axis=0).astype(np.uint8)
    pred_cls_ids = []
    for indice in indices:
        pred_cls_ids.append(seg_pred[indice[0], indice[1]])
    pred_scores = scores

    By doing this, I got result['pred_class_ids']andresult['pred_scores'] Then result['pred_RTs'][i, :, :] = abs_pose_outputs[i].camera_T_object, result['pred_scales'] = size(returned by get_gt_pointclouds, not abs_pose_outputs[i].scale_matrix because I think result['pred_scales'] represents bounding box size according to get_3d_bbox). I'm not sure if I understand the meaning of keys in result correctly. I also call the draw_detections to visualize these results as follows(green color: ground truth), but nothing seems to be wrong. 1

  2. code in nocs_eval_utils.py I downloaded the predictions of NOCS provided by object-deformnet and use this as pred_result to call the compute_mAP in nocs_eval_utils.py, but I still got low value mAP. Then I used these predictions of NOCS as pred_result to call the evaluation script (provided by object-deformnet), and got reasonable result, so I'm confused. Furthermore, nocs_eval_utils.py comment out the code about result['pred_scores'], is there any problem with the scores or with the code?

I sincerely appreciate your help and look forward to your reply!

zubair-irshad commented 2 years ago

Hi @sky23249, your understanding is correct and the visualization you show looks accurate. I have a question: Are you training the model from scratch for evaluation? Which checkpoints are you using and what training strategy did you employ i.e. did you train on synthetic first for 22-25 epochs and fine-tune on real dataset for a few epochs? Since we currently only release the inference checkpoint for visualization purposes, I would recommend training from scratch using the provided training strategy to get the results mentioned in the paper.

Regarding your second point, there is a slight discrepancy in how you are evaluating. Our method CenterSnap is currently class agnostic. Since we mentioned in the paper, please see Table 1 Caption, for a fair comparison with other baselines, we currently evaluate using the class predictions given by NOCS because 1. our method is class-agnostic i.e. we do not predict a class id or mask in the architecture that we mention in the paper and 2. However we do release the segmentation prediction in our codebase, our class id/mask predictions are only trained on the provided NOCS dataset whereas other approaches use mask-rcnn pre-trained on other large datasets. And hence a slight modification to your code to have the same evaluation as all the other baselines would be to use the mask-rcnn class ids as provided by object-deformnet here. You just need to find the closest class_id from the output of output_indices which our model provides. Happy to provide a helper function for it separately.

Hope it helps!

sky23249 commented 2 years ago

Thanks for your detailed answer sincerely @zubair-irshad. I generate the pred_result and visualization by running the inference on the checkpoint you provided in nocs_test_subset.tar.gz, which got low mAP on NOCS Synthetic dataset but got more reasonable mAP on NOCS Real dataset. According to your answer, now I infer that maybe this problem is caused by the pred_class_ids. I will try again to generate pred_class_ids from mask-rcnn class ids. I would be very grateful if you could provide a helper function or a full evaluation script.

In addition, I’m now following the readme to train the model from scratch. You said the inference checkpoint is for visualization purposes, so this is not the best checkpoint used in paper? If so, for comparison purposes, is it possible to release the best checkpoint?

Thanks again for your help!

zubair-irshad commented 2 years ago

Hi @sky23249, Correct the checkpoint released with the nocs_test_subset.tar.gz is only for visualization and could be used with real dataset only as it was fine-tuned for real dataset. For evaluation on camera, we will not be releasing the checkpoint, please re-train the network as described here on synthetic data to obtain the best checkpoint on synthetic data. Furthermore, you can fine-tune the same checkpoint on real data for a few epoch to obtain the best checkpoint on real data.

I will check on my end if it is possible to release the best checkpoint on real data (as it was not part of our plan earlier and it might take some time). I highly recommend you training the network from scratch to get the evaluation results in the paper.

Please see the helper function here to obtain class_ids similar to all baselines. Please also see this for how to access mrcnn_result.

Hope it helps!

sky23249 commented 2 years ago

Thank you for your help and patience sincerely! I'll try it again.

ran894694447 commented 1 year ago

Thanks for your help!

I successfully generate the pred_result and call the compute_mAP in nocs_eval_utils.py, but got very low value of mAP. There are mainly two questions:

  1. The meaning of keys in result First I generate ground truth result by run detect_eval (provided by NOCS). Then I generate the pred_result by modifying the inference script, I modify this line to get segmentation output and save pred_cls_ids by selecting indices from pose_output: https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/inference/inference_real.py#L61

    https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/inference/inference_real.py#L62

    modified code:

    seg_output, depth_output, small_depth_output, pose_output = model.forward(input)
    latent_emb_outputs, abs_pose_outputs, img_output, scores, indices = pose_output.compute_pointclouds_and_poses(
                       min_confidence, is_target=False)
    
    seg_pred = seg_output.get_prediction()
    seg_pred = np.argmax(seg_pred, axis=0).astype(np.uint8)
    pred_cls_ids = []
    for indice in indices:
       pred_cls_ids.append(seg_pred[indice[0], indice[1]])
    pred_scores = scores

    By doing this, I got result['pred_class_ids']andresult['pred_scores'] Then result['pred_RTs'][i, :, :] = abs_pose_outputs[i].camera_T_object, result['pred_scales'] = size(returned by get_gt_pointclouds, not abs_pose_outputs[i].scale_matrix because I think result['pred_scales'] represents bounding box size according to get_3d_bbox). I'm not sure if I understand the meaning of keys in result correctly. I also call the draw_detections to visualize these results as follows(green color: ground truth), but nothing seems to be wrong. 1

  2. code in nocs_eval_utils.py I downloaded the predictions of NOCS provided by object-deformnet and use this as pred_result to call the compute_mAP in nocs_eval_utils.py, but I still got low value mAP. Then I used these predictions of NOCS as pred_result to call the evaluation script (provided by object-deformnet), and got reasonable result, so I'm confused. Furthermore, nocs_eval_utils.py comment out the code about result['pred_scores'], is there any problem with the scores or with the code?

I sincerely appreciate your help and look forward to your reply!

Hello,Could you plz tell me how to run detect_eval to generate ground truth result?I tried running detect_eval but got nothing.Should I configure the environment as NOCS?

zubair-irshad commented 1 year ago

Hi @ran894694447,

Do you mean ground truth for validation and testing? Please see this answer.

Pasting the relevant parts from the answer again here:

The GT label file is the same as provided by NOCS. Please see this script (provided by object-deformnet) to create a label file for each image. You can load the gts label file as follows:

    with open(img_full_path + '_label.pkl', 'rb') as f:
        gts = cPickle.load(f)

You can then use these labels file for evaluation in compute_mAP function as follows:

    result = {}
    result['gt_class_ids'] = gts['class_ids']
    result['gt_bboxes'] = gts['bboxes']
    result['gt_RTs'] = gts['poses']
    result['gt_scales'] = gts['size']
    result['gt_handle_visibility'] = gts['handle_visibility']

    result['pred_class_ids'] = class_ids
    result['pred_scores'] = scores
    result['pred_RTs'] = f_sRT
    result['pred_scales'] = f_size

    pred_results.append(result)

Please let me know if this answers your question.

ran894694447 commented 1 year ago

@zubair-irshad Thanks for your answer, I will try it.

ran894694447 commented 1 year ago

@zubair-irshad Thank you for your patient answer and great work, I have successfully generated the evaluation results.

Trulli99 commented 1 year ago

Thanks for your help!

I successfully generate the pred_result and call the compute_mAP in nocs_eval_utils.py, but got very low value of mAP. There are mainly two questions:

1. The meaning of keys in `result`
   First I generate ground truth result by run [detect_eval](https://github.com/hughw19/NOCS_CVPR2019/blob/78a31c2026a954add1a2711286ff45ce1603b8ab/detect_eval.py) (provided by NOCS). Then I generate the `pred_result` by modifying the [inference script](https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/inference/inference_real.py), I modify this line to get segmentation output and save `pred_cls_ids` by selecting `indices` from `pose_output`: https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/inference/inference_real.py#L61

   https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/inference/inference_real.py#L62

   modified code:
   ```python
   seg_output, depth_output, small_depth_output, pose_output = model.forward(input)
   latent_emb_outputs, abs_pose_outputs, img_output, scores, indices = pose_output.compute_pointclouds_and_poses(
                       min_confidence, is_target=False)

   seg_pred = seg_output.get_prediction()
   seg_pred = np.argmax(seg_pred, axis=0).astype(np.uint8)
   pred_cls_ids = []
   for indice in indices:
       pred_cls_ids.append(seg_pred[indice[0], indice[1]])
   pred_scores = scores
   ```

   By doing this, I got `result['pred_class_ids']`and`result['pred_scores']`
   Then `result['pred_RTs'][i, :, :] = abs_pose_outputs[i].camera_T_object`,  `result['pred_scales'] = size`(returned by [get_gt_pointclouds](https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/utils/transform_utils.py#L37), not `abs_pose_outputs[i].scale_matrix` because I think `result['pred_scales']` represents bounding box size according to [get_3d_bbox](https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/utils/nocs_eval_utils.py#L241)). I'm not sure if I understand the meaning of keys in `result` correctly. I also call the [draw_detections](https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/utils/nocs_eval_utils.py#L854) to visualize these results as follows(green color: ground truth), but nothing seems to be wrong.
   ![1](https://user-images.githubusercontent.com/103924094/183580121-92189592-a928-426f-b714-19e29c3e7fe9.png)

2. code in [nocs_eval_utils.py](https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/utils/nocs_eval_utils.py#L564)
   I downloaded **the predictions of NOCS** provided by [object-deformnet](https://github.com/mentian/object-deformnet#evaluation) and use this as `pred_result` to call the `compute_mAP` in [nocs_eval_utils.py](https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/utils/nocs_eval_utils.py#L564), but I still got low value mAP. Then I used **these predictions of NOCS** as `pred_result` to call the [evaluation script](https://github.com/mentian/object-deformnet/blob/a2dcdb87dd88912c6b51b0f693443212fde5696e/evaluate.py#L273) (provided by  object-deformnet), and got reasonable result, so I'm confused. Furthermore, [nocs_eval_utils.py](https://github.com/zubair-irshad/CenterSnap/blob/c2afd120428b0a07c88894da23311995b72bbbfd/utils/nocs_eval_utils.py#L564) comment out the code about `result['pred_scores']`, is there any problem with the `scores` or with the code?

I sincerely appreciate your help and look forward to your reply!

Did you change anything in the draw_detections function? My bounding boxes are really bad imagem And I think I did everything okay to get the pred_results.

submagr commented 1 year ago

@Trulli99 uncommenting this code and commenting out this code made it work for me.

Trulli99 commented 1 year ago

@submagr It worked, thank you so much!