Visualizing the inference results

hamzagorgulu commented 11 months ago

Hi. I am training a custom dataset with this repo. Here is how the logs continue for 3rd and 4th epochs as an example:

lr: 0.0000857

Is_training: True. [3,19][1,244], imps: 210.41, est: 1.40h, G_loss: 1.00696, running_mf1: 0.50000
Is_training: True. [3,19][101,244], imps: 210.60, est: 1.37h, G_loss: 0.93353, running_mf1: 0.50000
Is_training: True. [3,19][201,244], imps: 210.59, est: 1.33h, G_loss: 0.86562, running_mf1: 0.50000
Is_training: True. Epoch 3 / 19, epoch_mF1= 0.50000
acc: 1.00000 miou: 0.50000 mf1: 0.50000 iou_0: 1.00000 iou_1: 0.00000 F1_0: 1.00000 F1_1: 0.00000 precision_0: 1.00000 precision_1: 0.00000 recall_0: 1.00000 recall_1: 0.00000 

Begin evaluation...
Is_training: False. [3,19][1,9], imps: 158.39, est: 1.86h, G_loss: 0.84020, running_mf1: 0.50000
Is_training: False. Epoch 3 / 19, epoch_mF1= 0.50000
acc: 1.00000 miou: 0.50000 mf1: 0.50000 iou_0: 1.00000 iou_1: 0.00000 F1_0: 1.00000 F1_1: 0.00000 precision_0: 1.00000 precision_1: 0.00000 recall_0: 1.00000 recall_1: 0.00000 

Lastest model updated. Epoch_acc=0.5000, Historical_best_acc=0.5000 (at epoch 0)

lr: 0.0000810

Is_training: True. [4,19][1,244], imps: 209.94, est: 1.32h, G_loss: 0.80973, running_mf1: 0.50000
Is_training: True. [4,19][101,244], imps: 209.91, est: 1.29h, G_loss: 0.75695, running_mf1: 0.50000
Is_training: True. [4,19][201,244], imps: 209.86, est: 1.25h, G_loss: 0.68211, running_mf1: 0.50000
Is_training: True. Epoch 4 / 19, epoch_mF1= 0.50000
acc: 1.00000 miou: 0.50000 mf1: 0.50000 iou_0: 1.00000 iou_1: 0.00000 F1_0: 1.00000 F1_1: 0.00000 precision_0: 1.00000 precision_1: 0.00000 recall_0: 1.00000 recall_1: 0.00000 

Begin evaluation...
Is_training: False. [4,19][1,9], imps: 168.24, est: 1.65h, G_loss: 0.69269, running_mf1: 0.50000
Is_training: False. Epoch 4 / 19, epoch_mF1= 0.50000
acc: 1.00000 miou: 0.50000 mf1: 0.50000 iou_0: 1.00000 iou_1: 0.00000 F1_0: 1.00000 F1_1: 0.00000 precision_0: 1.00000 precision_1: 0.00000 recall_0: 1.00000 recall_1: 0.00000 

Lastest model updated. Epoch_acc=0.5000, Historical_best_acc=0.5000 (at epoch 0)

Its being trained for a few hours with almost complete gpu capacity. I tried to visualize the predictions as follows:

for batch in dataloader:
        model._forward_pass(batch)
        pred = model._visualize_pred()
        save_image(pred/255 , os.path.join("inference_results", f"{batch['name'][0]}.jpg"))

However, the results I got are very weird for every sample as follows:

I thought the model was not learning but when I tried with LEVIR dataset and the pretrained weights, I got very similar weird results:

So am I missing something while visualizing the predictions?

wgcban commented 11 months ago

@hamzagorgulu Seems like there is a problem with your custom data/dataset class/dataloader. First check how the ground truth masks are read are they read as {0,1} or {0, 255}. Make sure they are loading correctly.

hamzagorgulu commented 11 months ago

My masks are in (0,255) format but I check the dataloader and it is in (0,1) format. I still could not find what the problem is.

Another thing I see when I train with custom dataset is that the metrics becomes nonsense after training e few epochs:

Begin evaluation...
Is_training: False. [1,36],  running_mf1: 0.50000
Is_training: False. [11,36],  running_mf1: 0.50000
Is_training: False. [21,36],  running_mf1: 0.50000
Is_training: False. [31,36],  running_mf1: 0.50000
acc: 1.00000 miou: 0.50000 mf1: 0.50000 iou_0: 1.00000 iou_1: 0.00000 F1_0: 1.00000 F1_1: 0.00000 precision_0: 1.00000 precision_1: 0.00000 recall_0: 1.00000 recall_1: 0.00000

wgcban commented 11 months ago

@hamzagorgulu Hi by looking at your training statistics, it seems you have some problem with data. Are the labels correctly loading in the training?

wgcban / ChangeFormer

Visualizing the inference results #80