htcr / sam_road

Segment Anything Model for large-scale, vectorized road network extraction from aerial imagery. CVPRW 2024
https://arxiv.org/pdf/2403.16051.pdf
MIT License
151 stars 18 forks source link

train_topo_loss is Nan.0 #35

Open Teassassin opened 1 month ago

Teassassin commented 1 month ago

I tried to train this model (vit-b) on spacenet on one GPU, but I got the log below during training.

Epoch 0:  21%|██        | 1108/5292 [05:18<20:01,  3.48it/s, v_num=ej4y, train_mask_loss=0.605, train_topo_loss=nan.0, train_loss=nan.0]

I changed the default batch_size to fit my GPU with 16G Mem and kept the other settings. This is the yaml.

DATASET: 'spacenet'

# IN1k + MAE only
NO_SAM: False

SAM_VERSION: 'vit_b'
SAM_CKPT_PATH: 'sam_ckpts/sam_vit_b_01ec64.pth'
PATCH_SIZE: 256
BATCH_SIZE: 16
DATA_WORKER_NUM: 1
TRAIN_EPOCHS: 30
BASE_LR: 0.001
FREEZE_ENCODER: False
ENCODER_LR_FACTOR: 0.1
ENCODER_LORA: False
FOCAL_LOSS: False
USE_SAM_DECODER: False

# TOPONET
# sample per patch
TOPO_SAMPLE_NUM: 128
TOPONET_VERSION: 'normal'

# Inference
INFER_BATCH_SIZE: 2
SAMPLE_MARGIN: 0
INFER_PATCHES_PER_EDGE: 16

# ======= keypoint ======
# Best threshold 0.1949462890625, P=0.34380707144737244 R=0.326823890209198 F1=0.3351004719734192
# ======= road ======
# Best threshold 0.3408203125, P=0.6585257053375244 R=0.7146456837654114 F1=0.6854389309883118
# ======= topo ======
# Best threshold 0.705078125, P=0.9746968150138855 R=0.9701263904571533 F1=0.9724062085151672

ITSC_THRESHOLD: 0.195
ROAD_THRESHOLD: 0.341
TOPO_THRESHOLD: 0.705
# pixels
ITSC_NMS_RADIUS: 8
ROAD_NMS_RADIUS: 16
NEIGHBOR_RADIUS: 64
MAX_NEIGHBOR_QUERIES: 16

Looking forward to your reply. Ths!