Training implementation issues

ruc-aimc-lab / GeoFormer

GeoFormer for Homography Estimation

17 stars 3 forks source link

Training implementation issues #2

Open STUDYHARD2113 opened 4 months ago

STUDYHARD2113 commented 4 months ago

Thanks for sharing your excellent job! I have the following questions 1.What is the version of pytorch lightning? I had errors if I didn't specify the version number of pytorch lightning during installation.

The following error message will appear during training. What is the problem? RuntimeError: one of the variables needed for gradient computation has been modifiederation: [torch.cuda.FloatTensor [4800, 256]], which is output 0 of AsStridedBackwar 4; expected version 3 instead. Hint: enable anomaly detection to find the operationompute its gradient, with torch.autograd.set_detect_anomaly(True).
Your paper stated that a total of 10 epochs were run, but max_epoch was not set in the code. How many steps or epochs were specifically run? Looking forward to your answer, thank you very much!

lyh200095 commented 4 months ago

You can use pytorch_lightning=1.3.5, this verison an apply to the code. And this verison of pl won't cause this problem. In my experiment, 20 epochs is enough.

STUDYHARD2113 commented 3 months ago

Thank you for your reply! I have a few more questions:

This error will be reported when batchsize>1. What is the cause? GeoFormer/utils/homography.py", line 98, in warp_points_batch warped_points = torch.bmm(homographies, points.permute(0, 2, 1)) # batch dot RuntimeError: self dim 0 must match batch2 dim 0
This error will still be reported occasionally when pytorch_lightning=1.3.5. Have you ever encountered it? RuntimeError: one of the variables needed for gradient computation has been modifiederation: [torch.cuda.FloatTensor [4800, 256]], which is output 0 of AsStridedBackwar 4; expected version 3 instead. 3.During training, I often encounter this warning. Is it critical? | WARNING | model.loftr_src.loftr.utils.supervision:spvs_coarse:99 - No groundtruth coarse match found for:

Looking forward to your answer, thank you very much!

lyh200095 commented 3 months ago

For the first problem, I think the batchsize must set to 1, when the batchsize changed, it will cause error. Second, I don't met this error, maybe you can try pl==1.6.0. Third, you can ignore this warning, this warning won't Influence accuracy.

TimLeeCV commented 3 months ago

@STUDYHARD2113 @lhz123 I am sorry for disturbing you. When I try to train this model only on the OxFord-Paris dataset, I get very bad results compared to the pre-trained model provided to me. May I ask what the results were like after you trained this model? Look forward to your kind reply！