htcr / sam_road

Segment Anything Model for large-scale, vectorized road network extraction from aerial imagery. CVPRW 2024
https://arxiv.org/pdf/2403.16051.pdf
MIT License
147 stars 18 forks source link

Train on your own 512 x 512 size image #32

Open zts12 opened 1 month ago

zts12 commented 1 month ago

For 512*512 images, do I need to modify the config settings for training on my own dataset? (This is the SpaceNet configuration). Also, do you have any advice on the training batch, the results of 30 sessions are not ideal? Thank you very much for your work on the SAM-Road project, and look forward to your answers in your busy schedule, thank you.

htcr commented 1 month ago

I think the CityScale setup defaults to 512x512 patch size. Can you try that? Batch size depends on your GPU memory, I think you can start with the largest batch size you can get away with - then tune LR properly to make sure it converges. It may need some trail-and-error.

zts12 commented 1 month ago

Thank you very much for your answer, this is the result of my current road extraction with my own 0.5 meter resolution image using the configuration of SpaceNet, not very ideal, do you have any suggestions? ![Uploading bj000035_mask_road_mask.png…]()

zts12 commented 1 month ago

bj000036_mask_road_mask bj000041_mask_road_mask bj000046_mask_road_mask bj000049_mask_road_mask ![Uploading bj000056_mask_road_mask.png…]()

htcr commented 1 month ago

I think our released model takes 1.0m/pixel images. Can you try resizing your images to that resolution? Also, have you fine-tuned on your own dataset? How large was your dataset?

htcr commented 1 month ago

Also did you correctly load the pre-trained SAM ckpts?

zts12 commented 1 month ago

Thank you again for your suggestion, I used the 0.5 meter resolution image to crop it to 512*512 image size, and also loaded the ckpt of the pre-trained SAM, and also adjusted the learning rate, and re-trained, but the effect is not justified by the good effect of the two datasets in the original paper, the dataset has a total of 3065 pieces, 2453 for training, 459 for testing, 153 for verification, according to the scale of Spacenet for the data division Each image is 512 and contains the corresponding required graph data. So if I change the image to 1 meter resolution, will the end result be improved? The results of the test are like this, and it is not clear whether the thresholds of key points and roads should also be modified? ======= Finding best thresholds ====== ======= keypoint ====== ======= Finding best thresholds ====== ======= keypoint ====== Best threshold 0.01090240478515625, P=0.0 R=0.0 F1=nan ======= road ====== Best threshold 0.01090240478515625, P=0.0 R=0.0 F1=nan ======= road ====== Best threshold 0.0965576171875, P=0.0 R=0.0 F1=nan ======= topo ====== Best threshold 0.0965576171875, P=0.0 R=0.0 F1=nan

htcr commented 1 month ago

Hi, I think if you are fine-tuning from the original SAM ckpt (not the ones I released), resolution is less crucial. How does the images look like in general? The numbers you shown seems to suggest the model did not converge at all. The size of the dataset sounds reasonable, can you try the following options:

1) Debug the label generation logic. Does the GT masks look reasonable? 2) See if the model can just overfit one example. If not, maybe some hyperparameters are wrong. 3) Try different batch sizes / learning rates. 4) Apply some data augmentation. In SAM-Road paper, random cropping and rotation were applied. 5) Try to zero some loss terms to find which one is exploding.

Good luck with your experiments!

zts12 commented 1 month ago

Sorry for the late reply, thank you for your suggestion, I will follow your suggestion to carry out the experiment, I am a graduate student in a university, and the current direction of study is to use high-resolution remote sensing images for road extraction, thank you for communicating with you, can you add WeChat, my WeChat account is 18837621961, I will be honored.

EchoQiHeng commented 1 month ago

I trained SAM on the DeepGlobe dataset and the results were convincing, so I believe SAM is robust. Please carefully check your code.

zts12 commented 1 month ago

Thank you for your work sharing, but I also used DeepGlobe for training and testing, I cropped it to a 512*512 image, and trained and tested, but the result is not very ideal, but the clear road that can be extracted is incomplete The effect is not very good, can I consult your config settings and the division rules of the dataset? Or is there some other modification work and configuration work that I haven't noticed? Thanks for your answer.

I trained SAM on the DeepGlobe dataset and the results were convincing, so I believe SAM is robust. Please carefully check your code.

EchoQiHeng commented 1 month ago

Thank you for your work sharing, but I also used DeepGlobe for training and testing, I cropped it to a 512*512 image, and trained and tested, but the result is not very ideal, but the clear road that can be extracted is incomplete The effect is not very good, can I consult your config settings and the division rules of the dataset? Or is there some other modification work and configuration work that I haven't noticed? Thanks for your answer.

I trained SAM on the DeepGlobe dataset and the results were convincing, so I believe SAM is robust. Please carefully check your code.

I have demonstrated the visualization results on the DeepGlobe validation set, and I believe the model has converged and is functioning as expected. It seems I did not make any specific configuration settings for DeepGlobe. Of course, modifications to the SatMapDataset were necessary, and my process primarily involved cropping and augmentation. Please carefully check your RGB images and the corresponding GT Mask. iou pred Additionally, I have displayed the IoU during the training process. Please provide more details and results from your experiments to facilitate further debugging. rgb