htcr / sam_road

Segment Anything Model for large-scale, vectorized road network extraction from aerial imagery. CVPRW 2024
https://arxiv.org/pdf/2403.16051.pdf
MIT License
98 stars 13 forks source link

Didn't find code about processing masks generated by Geometry Decoder for Topological Decoder #3

Closed Norman-Ou closed 1 month ago

Norman-Ou commented 3 months ago

As you mentioned in the 3rd paragraph of section 3.3, I quote

After acquiring the masks, the graph vertices are extracted from them. This process converts the dense mask images into a set of sparse vertices, with roughly the same interval dv in between. It's implemented with simple nonmaximum suppression: we first drop the pixels under a probability threshold t, then traverse them by a descending order of their probability.

In my understantding, the whole process of generating vertices from masks that you mentioned, algorithm 1 in your paper, should take place online during model training and inference.

However, in the model.py#L405-L448 code you provide, the vertices (in the variable 'graph_points') are passed by the predefined dataset into the module about the topology, not by algorithm 1.

If your code is not found because of my oversight, please correct me!

Sincerely, Ruizhe

htcr commented 3 months ago

Hi Ruizhe, the logic you mentioned is in the inference.py. model.py contains mainly training code. As mentioned in the paper, during training, the model uses "teacher forcing", which adds noise to ground-truth to emulate the predicted vertices.

htcr commented 1 month ago

Closing, feel free to reopen if need further discussion.