Closed virusapex closed 3 years ago
Thanks for your bug report. Could you please create a PR to fix it? BTW, does this bug affect any models on SUNRGBD?
Thanks for your bug report. Could you please create a PR to fix it? BTW, does this bug affect any models on SUNRGBD?
Yes, I have created a PR. Hopefully, it was correct, since it's the first time for me. I've trained the VoteNet model and got a similar accuracy to the one you posted in Readme.md, but that was without this fix. I'm not sure, if changing this value will change the accuracy.
As far as I know, VoteNet uses only 3D point clouds without 2D image, while the K
and Rt
are used for 3D to 2D projection. So this issue won't affect the performance of VoteNet. But it might affect ImVoteNet which uses projection and K
. Maybe we can train an ImVoteNet later after the fix to check its performance?
Yes this bug shall affect the performance of ImVoteNet but this fix should solve the problem. I believe the root cause of #448 is also this bug.
As far as I know, VoteNet uses only 3D point clouds without 2D image, while the
K
andRt
are used for 3D to 2D projection. So this issue won't affect the performance of VoteNet. But it might affect ImVoteNet which uses projection andK
. Maybe we can train an ImVoteNet later after the fix to check its performance?
Hello, again! Sorry for re-opening the issue, but as per your suggestion, I was able to re-train ImVoteNet model after the fix, training both first and second stages myself. I got 61.99 AP@0.25, which is more or less similar to your 64.04, albeit within a big margin. Seems like, the model didn't suffer much from the bug.
Hi, I think 61.99 is a bit low. Do you mean 61.99 is achieved with the correct code?
Hi, I think 61.99 is a bit low. Do you mean 61.99 is achieved with the correct code?
Yeah, the model just finished training. I re-generated the dataset 2 days ago. Although, I'm not entirely sure about training the 2nd stage - am I supposed to link in the config the weights I got from the first stage or should I've just used yours? In any case, I would get warnings like missing keys in source state_dict: pts_backbone.SA_modules.0.mlps.0.layer0.conv.weight,pts_backbone.SA_modules.0.mlps.0.layer0.bn.weight,pts_backbone.SA_modules.0.mlps.0.layer0.bn.bias,pts_backbone.SA_modules.0.mlps.0.layer0.bn.running_mean,pts_backbone.SA_modules.0.mlps.0.layer0.bn.running_var,
and so on. Which is understandable since we are only using a 2D network to train a 3D one and they have different layers. Or maybe I'm doing something wrong here.
The missing keys warning is normal because we did not load the 3d backbone. You can either link the weights you trained or provided by us. How much did you get from first stage?
There is also a probability that the model is simply unlucky. As SUN RGB-D is not a very big dataset, some fluxation is expected.
Ok, makes sense then. I got mAP@0.5 = 61.08 which is, again lower than your 62.7. I could be unlucky =)
Oh, since I forgot to mention you, @THU17cyz, you probably didn't see my previous message. BTW, unrelated question, sorry for bothering you, but I don't completely understand the size of the model weights, since the first stage is 330Mb, but the 2nd stage is just 190Mb, but the ImVoteNet model only grows in complexity with fusion of layers, does it not?
@THU17cyz May have a look at this comment?
Oh, since I forgot to mention you, @THU17cyz, you probably didn't see my previous message. BTW, unrelated question, sorry for bothering you, but I don't completely understand the size of the model weights, since the first stage is 330Mb, but the 2nd stage is just 190Mb, but the ImVoteNet model only grows in complexity with fusion of layers, does it not?
Hi @virusapex ,
Terribly sorry that I forgot to reply you. I was busy with graduation-related things the past two weeks.
The strange thing in the size of the model weights is probably because in the first stage the img branch isn't frozen, but in the second stage the img branch is frozen.
As for the performance, mAP@0.5 = 61.08 in stage 1 is also reasonable.
Oh, since I forgot to mention you, @THU17cyz, you probably didn't see my previous message. BTW, unrelated question, sorry for bothering you, but I don't completely understand the size of the model weights, since the first stage is 330Mb, but the 2nd stage is just 190Mb, but the ImVoteNet model only grows in complexity with fusion of layers, does it not?
Hi @virusapex ,
Terribly sorry that I forgot to reply you. I was busy with graduation-related things the past two weeks.
The strange thing in the size of the model weights is probably because in the first stage the img branch isn't frozen, but in the second stage the img branch is frozen.
As for the performance, mAP@0.5 = 61.08 in stage 1 is also reasonable.
Hey, @THU17cyz ,
Don't sweat it! Wishing you the best!
Yeah, I can see that the image branch is frozen, hence the freeze_img_branch=True,
statement in the config, but how exactly does it reduce the weights of the final model? All the weights are still included, no? It seems, there is some kind of an issue, because:
Your pretrained ImVoteNet: first stage model = 158MB, second stage = 166MB; The one I'm getting: first stage model = 315MB, second stage = 181MB.
Is it possible that your saved checkpoint includes state_dict of optimizer while ours doesn't?
Is it possible that your saved checkpoint includes state_dict of optimizer while ours doesn't?
I think @Wuziyi616 got the point. The model ckpts in the model zoo is processed thru this script, so the optimizer state_dict is deleted from the ckpt file.
And since in second stage the img branch parameters are frozen, there is less related data in the optimizer state_dict.
Is it possible that your saved checkpoint includes state_dict of optimizer while ours doesn't?
I think @Wuziyi616 got the point. The model ckpts in the model zoo is processed thru this script, so the optimizer state_dict is deleted from the ckpt file.
And since in second stage the img branch parameters are frozen, there is less related data in the optimizer state_dict.
Oh, didn't know about that. Thank you for the clarification, I gotta study PyTorch more =) I believe, we can close the issue.
Is it possible that your saved checkpoint includes state_dict of optimizer while ours doesn't?
I think @Wuziyi616 got the point. The model ckpts in the model zoo is processed thru this script, so the optimizer state_dict is deleted from the ckpt file. And since in second stage the img branch parameters are frozen, there is less related data in the optimizer state_dict.
Oh, didn't know about that. Thank you for the clarification, I gotta study PyTorch more =) I believe, we can close the issue.
Happy to be helpful :-) Closing this issue now.
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug Whenever, calibration files from the SUN RGB-D dataset are being read, MMDetection3D stores Rt values in both Rt and K fields in infos.pkl files.
Reproduction
What command or script did you run?
python tools/create_data.py sunrgbd --root-path ./data/sunrgbd --out-dir ./data/sunrgbd --extra-tag sunrgbd
Did you make any modifications on the code or config? Did you understand what you have modified?
Nothing was changed.
What dataset did you use?
SUN RGB-D
Environment
python mmdet3d/utils/collect_env.py
to collect necessary environment infomation and paste it here.TorchVision: 0.9.0.dev20210103+cu101 OpenCV: 4.4.0 MMCV: 1.3.1 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 10.1 MMDetection: 2.11.0 MMDetection3D: 0.12.0+e21e61e
def get_calibration(self, idx): calib_filepath = osp.join(self.calib_dir, f'{idx:06d}.txt') lines = [line.rstrip() for line in open(calib_filepath)] Rt = np.array([float(x) for x in lines[0].split(' ')]) Rt = np.reshape(Rt, (3, 3), order='F').astype(np.float32) K = np.array([float(x) for x in lines[1].split(' ')]) K = np.reshape(Rt, (3, 3), order='F').astype(np.float32) return K, Rt
def get_calibration(self, idx): calib_filepath = osp.join(self.calib_dir, f'{idx:06d}.txt') lines = [line.rstrip() for line in open(calib_filepath)] Rt = np.array([float(x) for x in lines[0].split(' ')]) Rt = np.reshape(Rt, (3, 3), order='F').astype(np.float32) K = np.array([float(x) for x in lines[1].split(' ')]) K = np.reshape(K, (3, 3), order='F').astype(np.float32) return K, Rt