Closed che1007 closed 7 months ago
Yes, it's normal. The two metrics should start from 0. Checkout here for details.
Is you instance loss keep increasing when training ? I just modify batch size to 8.
Though the instance loss keeps increasing, iou keep increasing. Is it normal?
Actually, I didn't monitor these intermediate metrics as they are not involved in the loss backpropagation. I may take some time to read the code details as they are inherited from RITM. How about your testing performance?
I trained the model, sbd_plainvit_base448, on single 2080ti, and just modified the batch size to 8. Train 55 epochs.
It's comparable to my results as shown in Tab. 2 of the paper. My batch size for ViT-B is 140 and I used 4 A6000 GPUs. Since your batch size is 8, your training iterations are 140/8 times higher than mine.
In your comment, I need to train more epochs than you.
No, I didn't mean that. I meant your training iterations were much higher than mine but got slightly worse results. We all trained the ViT-B model with 55 epochs but used different training batches (140 vs. 8). Therefore, we have different training iterations (N55/140 vs. N55/8, where N is the number of training images).
Is it possible to get the similar performance that shows on the paper with training on single 2080ti?
Your results are very close to mine given the differences in our training environments.
I would like to ask the training loss. It is normal that instance_loss_k and instance_loss_m increase when training.