Closed Njuod closed 3 years ago
Hello. I believe that is because your image size 500 is not divisible by 16, so when you downsampled the images, the pixels shifted a bit, the saved sampling indices would not match the size of upsampled features.
An easy fix is to resize the input images to the size that is divisible by 16, like 256 as an example.
Let me know whether that solves your issue.
Hi! Thank you for your quick answer. I've padded the images to 512 to be divisible by 16 and it seems that solves this issue. Thanks!
Hi again :),
My dataset is very large which caused the following error:
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 5.79 GiB total capacity; 4.48 GiB already allocated; 10.31 MiB free; 25.81 MiB cached)
I see the model uses only 1 GPU. So, I'm trying to train the model with 2 GPUs by wrapping the model in nn.DataParallel
as follow:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = torch.nn.DataParallel(SegNet())
SegNet_MTAN = model.to(device)
But I got this error:
Standard training strategy without data augmentation.
Traceback (most recent call last):
File "model_segnet_mtan.py", line 230, in <module>
5)
File "/home/models/mtan/im2im_pred/utils.py", line 152, in multi_task_trainer
conf_mat = ConfMatrix(multi_task_model.class_nb)
File "/home/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 591, in __getattr__
type(self).__name__, name))
AttributeError: 'DataParallel' object has no attribute 'class_nb'
could you help me to solve it?
I think after you wrapped the model in a multi-gpu setting, the model class becomes model.module. So you need to access the class_nb by multi_task_model.module.class_nb.
I would suggest to check out pytorch documentation for these questions.
Best,
Thanks, that solves the error.
I'm using the model to do three semantic tasks with a different number of classes. So, instead of semantic, depth and normal, I have semantic label, semantic label_2 and semantic label_3. I've modified themodel_segnet_mtan.py
and utils.py
files, but when I'm using the semantic label_2 I got nan
value:
Parameter Space: ABS: 44234471.0, REL: 1.7707
LOSS FORMAT: SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-2_LOSS MEAN_IOU PIX_ACC
Standard training strategy without data augmentation.
Epoch: 0000 | TRAIN: 1.8811 0.0339 0.5995 | 1.5766 0.0374 0.7284 | 2.6665 0.0122 0.6375 ||TEST: 1.0568 0.0376 0.7893 | 1.0827 0.0376 0.7893 | 1.5529 nan 0.7953
Epoch: 0001 | TRAIN: 0.9069 0.0390 0.8198 | 0.9206 0.0390 0.8198 | 1.2247 0.0142 0.8263 ||TEST: 1.0012 0.0376 0.7893 | 1.0153 0.0376 0.7893 | 1.3278 nan 0.7953
Epoch: 0002 | TRAIN: 0.8436 0.0390 0.8198 | 0.8585 0.0390 0.8198 | 1.0681 0.0142 0.8263 ||TEST: 0.9553 0.0376 0.7893 | 0.9683 0.0376 0.7893 | 1.2469 nan 0.7953
Epoch: 0003 | TRAIN: 0.8302 0.0390 0.8198 | 0.8416 0.0390 0.8198 | 1.0364 0.0142 0.8263 ||TEST: 0.9744 0.0376 0.7891 | 0.9848 0.0377 0.7884 | 1.2454 nan 0.7953
Epoch: 0004 | TRAIN: 0.8391 0.0391 0.8198 | 0.8458 0.0392 0.8197 | 1.0312 0.0142 0.8263 ||TEST: 1.9153 0.0377 0.7890 | 1.6111 0.0386 0.7873 | 2.0755 nan 0.7953
While when I'm using semantic label_3 I got the following error:
Parameter Space: ABS: 44237786.0, REL: 1.7709
LOSS FORMAT: SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-1_LOSS MEAN_IOU PIX_ACC | SEMANTIC-3_LOSS MEAN_IOU PIX_ACC
Standard training strategy without data augmentation.
Traceback (most recent call last):
File "model_segnet_mtan.py", line 230, in <module>
10)
File "/mtan/im2im_pred/utils.py", line 165, in multi_task_trainer
model_fit(train_pred[2], train_label3, 'semantic')]
File "/mtan/im2im_pred/utils.py", line 22, in model_fit
loss = F.nll_loss(x_pred, x_output, ignore_index=-1)
File "/home/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 1873, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /pytorch/aten/src/THNN/generic/SpatialClassNLLCriterion.c:109
Could you give me some advice, thank you.
For 3 segmentation tasks, you probably need to make sure the prediction classes for each classifer head is correct... It looks like you probably predict a larger number of classes compared to a pre-defined ground-truth classes. And you need to define confusion matrices for each semantic class so the miou score would be computed correctly.
Thanks for your reply!
Can you clarify this part more "And you need to define confusion matrices for each semantic class so the miou score would be computed correctly."
This is how I modified utils.py
def multi_task_trainer(train_loader, test_loader, multi_task_model, device, optimizer, scheduler, opt, total_epoch=200):
train_batch = len(train_loader)
test_batch = len(test_loader)
T = opt.temp
avg_cost = np.zeros([total_epoch, 19], dtype=np.float32)
lambda_weight = np.ones([3, total_epoch])
for index in range(total_epoch):
cost = np.zeros(19, dtype=np.float32)
# apply Dynamic Weight Average
if opt.weight == 'dwa':
if index == 0 or index == 1:
lambda_weight[:, index] = 1.0
else:
w_1 = avg_cost[index - 1, 0] / avg_cost[index - 2, 0]
w_2 = avg_cost[index - 1, 3] / avg_cost[index - 2, 3]
w_3 = avg_cost[index - 1, 6] / avg_cost[index - 2, 6]
lambda_weight[0, index] = 3 * np.exp(w_1 / T) / (np.exp(w_1 / T) + np.exp(w_2 / T) + np.exp(w_3 / T))
lambda_weight[1, index] = 3 * np.exp(w_2 / T) / (np.exp(w_1 / T) + np.exp(w_2 / T) + np.exp(w_3 / T))
lambda_weight[2, index] = 3 * np.exp(w_3 / T) / (np.exp(w_1 / T) + np.exp(w_2 / T) + np.exp(w_3 / T))
# iteration for all batches
multi_task_model.train()
train_dataset = iter(train_loader)
conf_mat = ConfMatrix(multi_task_model.module.class_nb)
conf_mat2 = ConfMatrix(multi_task_model.module.class_nb2)
conf_mat3 = ConfMatrix(multi_task_model.module.class_nb3)
for k in range(train_batch):
train_data, train_label, train_label2, train_label3 = train_dataset.next()
train_data, train_label = train_data.to(device), train_label.long().to(device)
train_label2, train_label3 = train_label2.long().to(device), train_label3.long().to(device)
train_pred, logsigma = multi_task_model(train_data)
optimizer.zero_grad()
train_loss = [model_fit(train_pred[0], train_label, 'semantic'),
model_fit(train_pred[1], train_label2, 'semantic'),
model_fit(train_pred[2], train_label3, 'semantic')]
if opt.weight == 'equal' or opt.weight == 'dwa':
loss = sum([lambda_weight[i, index] * train_loss[i] for i in range(3)])
else:
loss = sum(1 / (2 * torch.exp(logsigma[i])) * train_loss[i] + logsigma[i] / 2 for i in range(3))
loss.backward()
optimizer.step()
# accumulate label prediction for every pixel in training images
conf_mat.update(train_pred[0].argmax(1).flatten(), train_label.flatten())
conf_mat2.update(train_pred[1].argmax(1).flatten(), train_label2.flatten())
conf_mat3.update(train_pred[2].argmax(1).flatten(), train_label3.flatten())
cost[0] = train_loss[0].item()
cost[3] = train_loss[1].item()
cost[6] = train_loss[2].item()
avg_cost[index, :9] += cost[:9] / train_batch
# compute mIoU and acc
avg_cost[index, 1:3] = conf_mat.get_metrics()
avg_cost[index, 4:6] = conf_mat2.get_metrics()
avg_cost[index, 7:9] = conf_mat3.get_metrics()
# evaluating test data
multi_task_model.eval()
conf_mat = ConfMatrix(multi_task_model.module.class_nb)
conf_mat2 = ConfMatrix(multi_task_model.module.class_nb2)
conf_mat3 = ConfMatrix(multi_task_model.module.class_nb3)
with torch.no_grad(): # operations inside don't track history
test_dataset = iter(test_loader)
for k in range(test_batch):
test_data, test_label, test_label2, test_label3 = test_dataset.next()
test_data, test_label = test_data.to(device), test_label.long().to(device)
test_label2, test_label3 = test_label2.long().to(device), test_label3.long().to(device)
test_pred, _ = multi_task_model(test_data)
test_loss = [model_fit(test_pred[0], test_label, 'semantic'),
model_fit(test_pred[1], test_label2, 'semantic'),
model_fit(test_pred[2], test_label3, 'semantic')]
conf_mat.update(test_pred[0].argmax(1).flatten(), test_label.flatten())
conf_mat2.update(test_pred[1].argmax(1).flatten(), test_label2.flatten())
conf_mat3.update(test_pred[2].argmax(1).flatten(), test_label3.flatten())
cost[9] = test_loss[0].item()
cost[12] = test_loss[1].item()
cost[15] = test_loss[2].item()
avg_cost[index, 9:] += cost[9:] / test_batch
# compute mIoU and acc
avg_cost[index, 10:12] = conf_mat.get_metrics()
avg_cost[index, 13:15] = conf_mat2.get_metrics()
avg_cost[index, 16:18] = conf_mat3.get_metrics()
scheduler.step()
print('Epoch: {:04d} | TRAIN: {:.4f} {:.4f} {:.4f} | {:.4f} {:.4f} {:.4f} | {:.4f} {:.4f} {:.4f} ||'
'TEST: {:.4f} {:.4f} {:.4f} | {:.4f} {:.4f} {:.4f} | {:.4f} {:.4f} {:.4f} '
.format(index, avg_cost[index, 0], avg_cost[index, 1], avg_cost[index, 2], avg_cost[index, 3],
avg_cost[index, 4], avg_cost[index, 5], avg_cost[index, 6], avg_cost[index, 7], avg_cost[index, 8],
avg_cost[index, 9], avg_cost[index, 10], avg_cost[index, 11], avg_cost[index, 12], avg_cost[index, 13],
avg_cost[index, 14], avg_cost[index, 15], avg_cost[index, 16], avg_cost[index, 17], avg_cost[index, 18]))
Is that what you meant or do you mean to define confusion matrices for each task?
Yes. This modification looks correct to me.
Hello, sorry to bother you. I still get nan value, even when I'm using model_segnet_single.py. Do you think it is related to the linear layer? the number of classes that I have is 58.
Maybe try to remove F.log_softmax in prediction and change the loss function to F.cross_entropy(pred, gt).
If you still have the issue, then it's definitely from your dataset or your implementation.
OK, I modify it like the following:
t1_pred = F.cross_entropy(self.pred_task1(atten_decoder[0][-1][-1]), GT)
t2_pred = F.cross_entropy(self.pred_task2(atten_decoder[1][-1][-1]), Gt)
t3_pred = F.cross_entropy(self.pred_task3(atten_decoder[2][-1][-1]), GT)
return [t1_pred, t2_pred, t3_pred], self.logsigma
Sorry, Could you let me know what should I pass to the GTs?
Sorry What I meant is, do: t1_pred = self.pred_task1(atten_decoder[0][-1][-1]) t2_pred = self.pred_task2(atten_decoder[1][-1][-1]) ...
and modify the loss function in utils.py: https://github.com/lorenmt/mtan/blob/268c5c1796bf972d1e0850dcf1e2f2e6598cc589/im2im_pred/utils.py#L22 into F.cross_entropy(x_pred, x_output, ignore_index=-1)
-1 represents the index you wish to ignore. You probably need to modify that as well according to the configuration in your dataset.
Oh!!! That was a silly mistake! I thought that my dataset is 0 indexed but I found that it is 1 indexed!! So sorry to take your time. Thank you very much for the help!
Hi, sorry to bother you.
I would like to modify the mtan model to do two tasks instead of three. Depending on the stan model I modify the code as the following:
for j in range(2):
if j < 2:
self.encoder_att.append(nn.ModuleList([self.att_layer([filter[0], filter[0], filter[0]])]))
self.decoder_att.append(nn.ModuleList([self.att_layer([2 * filter[0], filter[0], filter[0]])]))
for i in range(4):
self.encoder_att[j].append(self.att_layer([2 * filter[i + 1], filter[i + 1], filter[i + 1]]))
self.decoder_att[j].append(self.att_layer([filter[i + 1] + filter[i], filter[i], filter[i]]))
# define task dependent attention module
for i in range(2):
for j in range(5):
if j == 0:
self.logsigma = nn.Parameter(torch.FloatTensor([-0.5, -0.5]))
return [t1_pred, t2_pred], self.logsigma
So, I'm wondering if I did that correctly, especially for steps 1 and 3.
also, why is the total parameters divided by 24981069?
Thanks!
# Multi-task Attention Network:
class MTANSegNet(nn.Module):
def __init__(self, tasks=['segmentation', 'depth', 'normal'],
out_channels={'segmentation': 13, 'depth': 1, 'normal': 3}):
super(MTANSegNet, self).__init__()
# initialise network parameters
filter = [64, 128, 256, 512, 512]
self.tasks = tasks
self.num_tasks = len(tasks)
# define encoder decoder layers
self.encoder_block = nn.ModuleList([self.conv_layer([3, filter[0]])])
self.decoder_block = nn.ModuleList([self.conv_layer([filter[0], filter[0]])])
for i in range(4):
self.encoder_block.append(self.conv_layer([filter[i], filter[i + 1]]))
self.decoder_block.append(self.conv_layer([filter[i + 1], filter[i]]))
# define convolution layer
self.conv_block_enc = nn.ModuleList([self.conv_layer([filter[0], filter[0]])])
self.conv_block_dec = nn.ModuleList([self.conv_layer([filter[0], filter[0]])])
for i in range(4):
if i == 0:
self.conv_block_enc.append(self.conv_layer([filter[i + 1], filter[i + 1]]))
self.conv_block_dec.append(self.conv_layer([filter[i], filter[i]]))
else:
self.conv_block_enc.append(nn.Sequential(self.conv_layer([filter[i + 1], filter[i + 1]]),
self.conv_layer([filter[i + 1], filter[i + 1]])))
self.conv_block_dec.append(nn.Sequential(self.conv_layer([filter[i], filter[i]]),
self.conv_layer([filter[i], filter[i]])))
# define task attention layers
self.encoder_att = nn.ModuleList([nn.ModuleList([self.att_layer([filter[0], filter[0], filter[0]])])])
self.decoder_att = nn.ModuleList([nn.ModuleList([self.att_layer([2 * filter[0], filter[0], filter[0]])])])
self.encoder_block_att = nn.ModuleList([self.conv_layer([filter[0], filter[1]])])
self.decoder_block_att = nn.ModuleList([self.conv_layer([filter[0], filter[0]])])
for j in range(self.num_tasks):
if j < (self.num_tasks - 1):
self.encoder_att.append(nn.ModuleList([self.att_layer([filter[0], filter[0], filter[0]])]))
self.decoder_att.append(nn.ModuleList([self.att_layer([2 * filter[0], filter[0], filter[0]])]))
for i in range(4):
self.encoder_att[j].append(self.att_layer([2 * filter[i + 1], filter[i + 1], filter[i + 1]]))
self.decoder_att[j].append(self.att_layer([filter[i + 1] + filter[i], filter[i], filter[i]]))
for i in range(4):
if i < 3:
self.encoder_block_att.append(self.conv_layer([filter[i + 1], filter[i + 2]]))
self.decoder_block_att.append(self.conv_layer([filter[i + 1], filter[i]]))
else:
self.encoder_block_att.append(self.conv_layer([filter[i + 1], filter[i + 1]]))
self.decoder_block_att.append(self.conv_layer([filter[i + 1], filter[i + 1]]))
self.pred_task = nn.ModuleList([self.conv_layer([filter[0], out_channels[t]], pred=True) for t in tasks])
# define pooling and unpooling functions
self.down_sampling = nn.MaxPool2d(kernel_size=2, stride=2, return_indices=True)
self.up_sampling = nn.MaxUnpool2d(kernel_size=2, stride=2)
def conv_layer(self, channel, pred=False):
if not pred:
conv_block = nn.Sequential(
nn.Conv2d(in_channels=channel[0], out_channels=channel[1], kernel_size=3, padding=1),
nn.BatchNorm2d(num_features=channel[1]),
nn.ReLU(inplace=True),
)
else:
conv_block = nn.Sequential(
nn.Conv2d(in_channels=channel[0], out_channels=channel[0], kernel_size=3, padding=1),
nn.Conv2d(in_channels=channel[0], out_channels=channel[1], kernel_size=1, padding=0),
)
return conv_block
def att_layer(self, channel):
att_block = nn.Sequential(
nn.Conv2d(in_channels=channel[0], out_channels=channel[1], kernel_size=1, padding=0),
nn.BatchNorm2d(channel[1]),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels=channel[1], out_channels=channel[2], kernel_size=1, padding=0),
nn.BatchNorm2d(channel[2]),
nn.Sigmoid(),
)
return att_block
def forward(self, x):
g_encoder, g_decoder, g_maxpool, g_upsampl, indices = ([0] * 5 for _ in range(5))
for i in range(5):
g_encoder[i], g_decoder[-i - 1] = ([0] * 2 for _ in range(2))
# define attention list for tasks
atten_encoder, atten_decoder = ([0] * self.num_tasks for _ in range(2))
for i in range(self.num_tasks):
atten_encoder[i], atten_decoder[i] = ([0] * 5 for _ in range(2))
for i in range(self.num_tasks):
for j in range(5):
atten_encoder[i][j], atten_decoder[i][j] = ([0] * 3 for _ in range(2))
# define global shared network
for i in range(5):
if i == 0:
g_encoder[i][0] = self.encoder_block[i](x)
g_encoder[i][1] = self.conv_block_enc[i](g_encoder[i][0])
g_maxpool[i], indices[i] = self.down_sampling(g_encoder[i][1])
else:
g_encoder[i][0] = self.encoder_block[i](g_maxpool[i - 1])
g_encoder[i][1] = self.conv_block_enc[i](g_encoder[i][0])
g_maxpool[i], indices[i] = self.down_sampling(g_encoder[i][1])
for i in range(5):
if i == 0:
g_upsampl[i] = self.up_sampling(g_maxpool[-1], indices[-i - 1])
g_decoder[i][0] = self.decoder_block[-i - 1](g_upsampl[i])
g_decoder[i][1] = self.conv_block_dec[-i - 1](g_decoder[i][0])
else:
g_upsampl[i] = self.up_sampling(g_decoder[i - 1][-1], indices[-i - 1])
g_decoder[i][0] = self.decoder_block[-i - 1](g_upsampl[i])
g_decoder[i][1] = self.conv_block_dec[-i - 1](g_decoder[i][0])
# define task dependent attention module
for i in range(self.num_tasks):
for j in range(5):
if j == 0:
atten_encoder[i][j][0] = self.encoder_att[i][j](g_encoder[j][0])
atten_encoder[i][j][1] = (atten_encoder[i][j][0]) * g_encoder[j][1]
atten_encoder[i][j][2] = self.encoder_block_att[j](atten_encoder[i][j][1])
atten_encoder[i][j][2] = F.max_pool2d(atten_encoder[i][j][2], kernel_size=2, stride=2)
else:
atten_encoder[i][j][0] = self.encoder_att[i][j](torch.cat((g_encoder[j][0], atten_encoder[i][j - 1][2]), dim=1))
atten_encoder[i][j][1] = (atten_encoder[i][j][0]) * g_encoder[j][1]
atten_encoder[i][j][2] = self.encoder_block_att[j](atten_encoder[i][j][1])
atten_encoder[i][j][2] = F.max_pool2d(atten_encoder[i][j][2], kernel_size=2, stride=2)
for j in range(5):
if j == 0:
atten_decoder[i][j][0] = F.interpolate(atten_encoder[i][-1][-1], scale_factor=2, mode='bilinear', align_corners=True)
atten_decoder[i][j][0] = self.decoder_block_att[-j - 1](atten_decoder[i][j][0])
atten_decoder[i][j][1] = self.decoder_att[i][-j - 1](torch.cat((g_upsampl[j], atten_decoder[i][j][0]), dim=1))
atten_decoder[i][j][2] = (atten_decoder[i][j][1]) * g_decoder[j][-1]
else:
atten_decoder[i][j][0] = F.interpolate(atten_decoder[i][j - 1][2], scale_factor=2, mode='bilinear', align_corners=True)
atten_decoder[i][j][0] = self.decoder_block_att[-j - 1](atten_decoder[i][j][0])
atten_decoder[i][j][1] = self.decoder_att[i][-j - 1](torch.cat((g_upsampl[j], atten_decoder[i][j][0]), dim=1))
atten_decoder[i][j][2] = (atten_decoder[i][j][1]) * g_decoder[j][-1]
# define task prediction layers
out = [0 for _ in self.tasks]
for i, t in enumerate(self.tasks):
out[i] = self.pred_task[i](atten_decoder[i][-1][-1])
if t == 'normal':
out[i] = out[i] / torch.norm(out[i], p=2, dim=1, keepdim=True)
return out
You can follow this implementation which should be cleaner in terms of scaling tasks.
And 24981069 represents the parameter size of single-task learning. So it's easier to compare the parameter size relatively in different networks.
Thank you, your reply is much appreciated.
Hi lorenmt!
Thank you for sharing your code. I would like to train the
model_segnet_mtan.py
with my own dataset. I have prepared the data to have the same size (500x500) and to be in the same format as increate_dataset.py
. However, when I've started the training I got the following error:It seems the error because of the input shape. Do I need to resize the images or can the model accept images with size 500x500?, how can I modify the indices shape? Could you help me to solve the error, I'm new to PyTorch?.