radar backbone outputs and freeze question

stanny880913 commented 11 months ago

Hello, I would like to ask according to your code, where can I get the feature map output of the radar backbone? And if I annotate freeze, does it mean that the camera branch and radar branch are trained at the same time? Thanks

longyunf commented 11 months ago

You can access the ouput of radar backbone from the variable named x_other. See https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L267

Yes, functions freeze_subnet and freeze_cam_heads prevent changes in monocular weights. See https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/scripts/train_radiant_fcos3d.py#L137C1-L138C28

stanny880913 commented 11 months ago

You can access the ouput of radar backbone from the variable named x_other. See

https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L267

Yes, functions freeze_subnet and freeze_cam_heads prevent changes in monocular weights. See https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/scripts/train_radiant_fcos3d.py#L137C1-L138C28

May I ask where did you call extract_feat func and concat image and radar like the image below, the part of concatenate thank you Screenshot from 2023-11-06 11-46-24

longyunf commented 11 months ago

See https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L266C8-L271

stanny880913 commented 11 months ago

See https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L266C8-L271

OK, thanks! In addition, I would like to ask you about the training of radar branch and how to obtain the ground truth for calculating the offset of the center of the object? I don’t quite understand how to do it in the paper! Thanks

and I annotate functions freeze_subnet and freeze_cam_heads , so the model in camera branch will change? will this change affect the radar brancg training or the results od detection? Thank you

longyunf commented 11 months ago

We associate radar points with GT boxes and compute 2D offset from radar points to corresponding GT centers on image as well as depth offsets. (see https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L2414)

Yes, the camera weights will change and may not preserve optimal monocular detection performance if you do not freeze them.

stanny880913 commented 10 months ago

We associate radar points with GT boxes and compute 2D offset from radar points to corresponding GT centers on image as well as depth offsets. (see

https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L2414

) Yes, the camera weights will change and may not preserve optimal monocular detection performance if you do not freeze them.

Hello, thank you very much for your response. I also want to ask:

How do you confirm that the radar point will have the best object detection effect when it is in the center of the object?
The input of radar branch is radar depth, radar bird’s-eye view (BEV) coordinates, Doppler velocity, and a mask for radar pixels, which together form an input of channel = 10?
Is raw radar data used in the final fusion? Have you seen the architecture diagram or drawn it?
Does the DWN architecture use lidar dense depth or radar sparse depth as GT?
How to visualize camera branch and radar branch output?
The radar depth offset output by the radar head is the radar depth, right? Is DWN's GT compared with the camera head output depth and radar head output depth offset? Thank you very much~

longyunf commented 10 months ago

I do not have the assumption. The radar points are typically not at object centers.
For radar inputs, see https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_pipelines.py#L197
No.
Use GT depths of object
Outputs can be visualized from bounding boxes (camera branch) or predicted offsets to object centers from radar points (radar branch): camera branch outputs: https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L959 radar branch output: https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L1002

stanny880913 commented 10 months ago

I do not have the assumption. The radar points are typically not at object centers.

For radar inputs, see https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_pipelines.py#L197

No.

Use GT depths of object

Outputs can be visualized from bounding boxes (camera branch) or predicted offsets to object centers from radar points (radar branch): camera branch outputs: https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L959

radar branch output: https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/radiant_fcos3d_network.py#L1002

Thanks for your reply!

But I have a doubt, that is, your paper is to predict the offset from the radar point to the center point of the object. You want the radar point to be at the center of the object. My question is how to determine whether the radar point is at the center of the object for the best effect, not the left or right side. etc.
How do you put radar data into Resnet18, I print(radar_map.shape) it show [1,10,928,1600], it's mean 1 batch_size, 10 channel,img_weight=928 and img_height=1600?I found that Resnet18 input need to be [3,224,224]?
What is this line do ?
Is the radar depth offset output by the radar head is the radar depth? So DWN's GT compared with the camera head output depth and radar head output depth offset? Thank you~

longyunf commented 10 months ago

The radar points can be seen as object candidates, i.e. object locations with initial object center predictions but inaccurate. The task of the model is to refine the initial predictions based on neighboring radar/cam information.
The input of convolutional networks is not necessarily fixed. You can run resnet (https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/resnet.py#L297) with different input sizes, although relative resolution between input and output may be fixed due to constant down-sampling.
DWN uses some raw radar information directly from radar measurement such as Doppler velocity.
DWN compares object depth estimation from camera head and from radar head.

stanny880913 commented 10 months ago

The radar points can be seen as object candidates, i.e. object locations with initial object center predictions but inaccurate. The task of the model is to refine the initial predictions based on neighboring radar/cam information.

The input of convolutional networks is not necessarily fixed. You can run resnet (https://github.com/longyunf/radiant/blob/cf5355396d42ef17940e29ef8f9e3cabfd8035c3/lib/my_model/resnet.py#L297 ) with different input sizes, although relative resolution between input and output may be fixed due to constant down-sampling.

DWN uses some raw radar information directly from radar measurement such as Doppler velocity.

DWN compares object depth estimation from camera head and from radar head.

Thank you very much for your answer and would also like to ask:

In the final DWN part, where is the application of the original radar? And what is it used for?
So the depth offset in the paper is the depth determined by the radar head? Not a simple offset

longyunf commented 10 months ago

Original radar measurement may offer some information on the confidence of radar head output, e.g. higher RCS may indicate stronger radar signal and higher confidence.
Can you clarify which depth offset you are referring to in the paper?

stanny880913 commented 4 months ago

Original radar measurement may offer some information on the confidence of radar head output, e.g. higher RCS may indicate stronger radar signal and higher confidence.

I understand, but your training for dwn is only in-depth. Where should other adaptations such as rcs and Doppler velocity be used? Thanks
What I want to ask about is the depthoffset output through the radar head today. First, the depth and pixel offset are performed on the original dataset, and then the dwn dataset is generated. Is this understanding correct?
I would like to ask additionally, regarding the final fusion, you set the threshold for filtering dwn value to 0.2. How is this parameter determined? Thanks

I'm really sorry to bother you with so many questions.

longyunf / radiant

radar backbone outputs and freeze question #6