yosungho / LineTR

Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization (Line Transformer)
Other
233 stars 35 forks source link

Line Signature Networks #7

Closed Zhuyawen7 closed 2 years ago

Zhuyawen7 commented 2 years ago

Hi, thanks for this very nice work. I read the paper, but I have some questions:

  1. Is your "LineTR_weight.pth" trained on scannet like superglue? As far as I know scannet does not have the ground truth of the line.
  2. I noticed your fig2, just one figure input the transformer and line signature networks. Does it mean that training transformer and line line signature networks does not require the ground truth of scannet, just input line and point that have been detected?
yosungho commented 2 years ago

Hi, thanks for this very nice work. I read the paper, but I have some questions:

  1. Is your "LineTR_weight.pth" trained on scannet like superglue? As far as I know scannet does not have the ground truth of the line.
  2. I noticed your fig2, just one figure input the transformer and line signature networks. Does it mean that training transformer and line line signature networks does not require the ground truth of scannet, just input line and point that have been detected?

Hello, here are the answers.

  1. Yes, it is trained in the ScanNet dataset in a similar way with SuperGlue. The dataset does not provide GT line correspondences, but those can be made using camera poses, RGB, and depth images. I used an LSD detector to make a training set. Please refer to the Experiment-B-indoor section for more details.

  2. The figure 2 represents the inference process. The network takes an image and line segments so that it makes line descriptors. As mentioned above, the network uses ground truth data for ScanNet training.

Zhuyawen7 commented 2 years ago

Hi, thanks for this very nice work. I read the paper, but I have some questions:

  1. Is your "LineTR_weight.pth" trained on scannet like superglue? As far as I know scannet does not have the ground truth of the line.
  2. I noticed your fig2, just one figure input the transformer and line signature networks. Does it mean that training transformer and line line signature networks does not require the ground truth of scannet, just input line and point that have been detected?

Hello, here are the answers.

  1. Yes, it is trained in the ScanNet dataset in a similar way with SuperGlue. The dataset does not provide GT line correspondences, but those can be made using camera poses, RGB, and depth images. I used an LSD detector to make a training set. Please refer to the Experiment-B-indoor section for more details.
  2. The figure 2 represents the inference process. The network takes an image and line segments so that it makes line descriptors. As mentioned above, the network uses ground truth data for ScanNet training.

Thank you for your reply, then please: 1.In your paper"Because depth maps are not sufficiently accurate to find all ground-truth line pairs, some correct line correspondences may be counted as a false negative. Therefore, we mitigate this issue by checking scene depth quality, as well as observing the ratio between the numbers of all extracted line segments and the validated line segments during projection/unprojection procedures." I don't understand this sentence a bit, What is your specific solution to this problem? And why don't you use the "Wireframe" dataset with groundtruth? 2.From what I understand, your "Line Signature Networks" uses the self-attention. Does this mean it doesn't need the ground truth.Or if I train "Line Signature Networks" module is the weight of the point(like superglue and not need line) can also be.

yosungho commented 2 years ago

Hi, here are my answers:

  1. First of all, the camera pose and depths are mostly correct in the ScanNet dataset, so the comment in my paper is relevant to dropping a small number of scenes that have incorrect information. I assumed that line correspondences should exist with a certain ratio when two images are overlapped. Therefore, If the ratio of matched lines among all detected line segments is below a threshold, I dropped the scene in the dataset. As far as I know, the wireframe dataset does not have 3D geometric information, where the networks may be limited to learning the line descriptions and their correspondences in the 3D world.
  2. It needs the ground truth for the self-attention networks to learn where to have attention.
Zhuyawen7 commented 2 years ago

Hi, thank you for your answer! I noticed that you shared the training code, but it doesn't seem to work for scannet, can the weights trained by this code be as good as the "LineTR_weight.pth" ? And, how to get the 3D-line-map in Scannet dataset?

yosungho commented 2 years ago

It should work properly in many cases as in SuperPoint. For the ScanNet dataset, You may build your own ones. Please refer to my paper and SuperGlue for more detailed procedures.

Zhuyawen7 commented 2 years ago

Hi, thank you for your answer! I noticed that there is no way to ask for auc in the code. How did you get the result of Homography Auc?For example how to get gt.txt for oxford dataset like superglue does for it's eval dataset.(intrinsic, cam0_T_w, cam1_T_w )