hustvl / MapTR

[ICLR'23 Spotlight & IJCV'24] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
MIT License
1.15k stars 175 forks source link

Some detailed questions about the Work #7

Closed Irvingao closed 2 years ago

Irvingao commented 2 years ago

Hi! At first, thanks for your excellent work! It's a really impressive and ideal modeling method for HD Map elements. I'm also tring to follow your work and there are some questions about it. I'm appreciate if you have any time to answer:

  1. The way that fuse instance queries q{inst} and shared point queries q{pts} to Hierarchical query embeddings q{hie}. Here is my thought: (1) init 2 query embedding seperately; (2) use a linear to reduce the q{pts} dims from 256 to 1; (3) merge context and position of q{pts} to each q{inst}. (4) Only q{hie} are used in Decoder Attn module.

  2. The initialization of q{inst} and q{pts}. If we just fuse q{hie} by the above way, how to make sure the point-level infomation are really learned by q{pts}?

  3. A cost question in Instance-level Matching. The paper said the point2point cost is used as the position matching cost, which is similar to the point-level matching cost. But in the model foward, q{hie} doesn't match any orders or permutations of point sets . It means when we use Hungarian algorithm to find the optimal instance-level assignment, the unmatched order and permutation will lead a error assignment between predicts and gts. It will cause the unfitting problems.

  4. Questions about the evaluation.

    • As the paper mentioned, Chamfer distance D{chamfer} is used as the evaluation matrix. But how to distinguish the paired predict and gt instance, do we need to calculate each D{chamfer} between predicts and gt, Then choose the lowerest D{chamfer} to pair each instance?
    • Otherway, I think Chamfer distance D{chamfer} is a not accurate way to evaluate the point sets. The low distance cost may happens when only two points of the each point set are closed but other points are far away from each corresponding. What's your thinking of this situation?
    • Whether the confidence of predicted instances is used in evaluation to filter the low confidence instance?

These are all my questions. Thanks for your great work!

LegendBC commented 2 years ago

Hi @Irvingao , thanks for your interest in our work and good questions!

  1. The way that fuse instance queries q{inst} and shared point queries q{pts} to Hierarchical query embeddings q{hie}. Here is my thought: (1) init 2 query embedding seperately; (2) use a linear to reduce the q{pts} dims from 256 to 1; (3) merge context and position of q{pts} to each q{inst}. (4) Only q{hie} are used in Decoder Attn module.
  2. The initialization of q{inst} and q{pts}. If we just fuse q{hie} by the above way, how to make sure the point-level infomation are really learned by q{pts}?

We have not used a linear to reduce $q^{\rm{pt}}$. We broadcast $q^{\rm{inst}}$ and $q^{\rm{pt}}$, and then directly add them to get $q^{\rm{hie}}$. So the point-level embeddings $q^{\rm{pt}}$ can be optimized and point-level information can be fused and refined with the reference points of $q^{\rm{hie}}$.

  1. A cost question in Instance-level Matching.

In the instance-level matching, we only consider the distance between the point set of prediction and the point set of GT. And we address the order issue in the point-level matching. With the optimal instance-level assignment and point-level assignment, we can now build point2point loss, which will supervise the model to predict the map elements in the order fit for the model instead of the artificially imposed fixed order. And in the ablation of position matching cost, we investigate the situation that the predicted point set A may lie closer to the GT, but it may be harder to force prediction A to be organized into the GT's shape with the optimal point-level assignment during training than prediction B, which lies further but has better order and is easier for optimizing. So we adopt the point2point cost as the position matching cost term, which brings the order into consideration in advance during instance-level matching.

As the paper mentioned, Chamfer distance D{chamfer} is used as the evaluation matrix. But how to distinguish the paired predict and gt instance, do we need to calculate each D{chamfer} between predicts and gt, Then choose the lowerest D{chamfer} to pair each instance? Whether the confidence of predicted instances is used in evaluation to filter the low confidence instance?

We calculate the AP the same as object detection, the only difference is that we use $D{\rm{chamfer}}$ instead of $D{\rm{iou}}$. So the confidence of predicted instances is used in the evaluation. We implement the evaluation code mainly referred to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/evaluation/mean_ap.py

Otherway, I think Chamfer distance D{chamfer} is a not accurate way to evaluate the point sets. The low distance cost may happens when only two points of the each point set are closed but other points are far away from each corresponding. What's your thinking of this situation?

$D{\rm{chamfer}}$ is calculated between predicted point set and GT point set. So when only two points of the each point set are closed but other points are far away from each corresponding, the distance cost will be large by the sum of all the pairs. A better way to evaluate the distance between the predicted map elements and GT map elements may be boundaryIoU. We have implemented it by replacing $D{\rm{chamfer}}$ with $D{\rm{boundaryIoU}}$ and find that the calculated AP has the same trend as $D{\rm{chamfer}}$. So we still used $D_{\rm{chamfer}}$ to calculate AP for the fair comparison with the previous methods.

Irvingao commented 2 years ago

@LegendBC Awsome reply and thanks for your patience! There is a further question:

LegendBC commented 2 years ago

How to predict or distinguish polygons from reg head?

The prediction of polygon and polyline is in a unified manner without difference. It only differs in annotation format, where the first point and the $20^{th}$ point of the polygon are the same location. So when it comes to predicting polygon, the model will automatically predict the first and last point with the same or similar location.

Will the project will be released at this month?

Thanks for your appreciation and concern. I will upload the code dependent on the acceptance of MapTR. If all goes well, the code should be released around early November, but I promise to release the code, please stay tuned to this project.

LegendBC commented 2 years ago

We release an initial version of MapTR, you can refer to the code for more details, so I'm closing this issue, but let us know if you have further questions.