Open Mindbooom opened 5 years ago
Hi,Now I have another question. I try to use 4 GPU to train the lstm1 by insert the code: net = torch.nn.DataParallel(net, device_ids=[0, 1, 2, 3]) But in the result, the input image can be divided to different GPU, but the hidden state-h and the cell state-c cannot be divided to gpu1,2and3. I cannot find a resolution for this, do you have any advise ?
Thanks @Mindbooom for pointing out the difference in ConvLSTM definition used by me. Actually the ConvLSTM which was used by me was from some papers on ConvLSTM. Now I have updated the ConvLSTM layer according to this paper definition.
For multiple GPU training, I will update the repo after 23rd November as currently quite occupied in other kinds of stuff. In the meantime, you can have look at asynchronous gradient decent training for multiple GPUs which is used in this paper and try to implement it.
Hi! @vikrant7 ,I have write a code for multiple GPU training by changing how to init and change the h and c. But It's training is extremely slow and I don't know why. I 'll share the code here. '''mvod_bottleneck_lstm1_multigpu.py'''
#!/usr/bin/python3
"""Script for creating basenet with one Bottleneck LSTM layer after conv 13.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from typing import List, Tuple
from utils import box_utils
from collections import namedtuple
from collections import OrderedDict
from torch.autograd import Variable
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
import numpy as np
import logging
def SeperableConv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0):
"""Replace Conv2d with a depthwise Conv2d and Pointwise Conv2d.
Arguments:
in_channels : number of channels of input
out_channels : number of channels of output
kernel_size : kernel size for depthwise convolution
stride : stride for depthwise convolution
padding : padding for depthwise convolution
Returns:
object of class torch.nn.Sequential
"""
return nn.Sequential(
nn.Conv2d(in_channels=int(in_channels), out_channels=int(in_channels), kernel_size=kernel_size,
groups=int(in_channels), stride=stride, padding=padding),
nn.ReLU6(),
nn.Conv2d(in_channels=int(in_channels), out_channels=int(out_channels), kernel_size=1),
)
def conv_bn(inp, oup, stride):
"""3x3 conv with batchnorm and relu
Arguments:
inp : number of channels of input
oup : number of channels of output
stride : stride for depthwise convolution
Returns:
object of class torch.nn.Sequential
"""
return nn.Sequential(
nn.Conv2d(int(inp), int(oup), 3, stride, 1, bias=False),
nn.BatchNorm2d(int(oup)),
nn.ReLU6(inplace=True)
)
def conv_dw(inp, oup, stride):
"""Replace Conv2d with a depthwise Conv2d and Pointwise Conv2d having batchnorm and relu layers in between.
Here kernel size is fixed at 3.
Arguments:
inp : number of channels of input
oup : number of channels of output
stride : stride for depthwise convolution
Returns:
object of class torch.nn.Sequential
"""
return nn.Sequential(
nn.Conv2d(int(inp), int(inp), 3, stride, 1, groups=int(inp), bias=False),
nn.BatchNorm2d(int(inp)),
nn.ReLU6(inplace=True),
nn.Conv2d(int(inp), int(oup), 1, 1, 0, bias=False),
nn.BatchNorm2d(int(oup)),
nn.ReLU6(inplace=True),
)
class MatchPrior(object):
"""Matches priors based on the SSD prior config
Arguments:
center_form_priors : priors generated based on specs and image size in config file
center_variance : a float used to change the scale of center
size_variance : a float used to change the scale of size
iou_threshold : a float value of thresholf of IOU
"""
def __init__(self, center_form_priors, center_variance, size_variance, iou_threshold):
self.center_form_priors = center_form_priors
self.corner_form_priors = box_utils.center_form_to_corner_form(center_form_priors)
self.center_variance = center_variance
self.size_variance = size_variance
self.iou_threshold = iou_threshold
def __call__(self, gt_boxes, gt_labels):
"""
Arguments:
gt_boxes : ground truth boxes
gt_labels : ground truth labels
Returns:
locations of form (batch_size, num_priors, 4) and labels
"""
if type(gt_boxes) is np.ndarray:
gt_boxes = torch.from_numpy(gt_boxes)
if type(gt_labels) is np.ndarray:
gt_labels = torch.from_numpy(gt_labels)
boxes, labels = box_utils.assign_priors(gt_boxes, gt_labels,
self.corner_form_priors, self.iou_threshold)
boxes = box_utils.corner_form_to_center_form(boxes)
locations = box_utils.convert_boxes_to_locations(boxes, self.center_form_priors, self.center_variance,
self.size_variance)
return locations, labels
'''
class BottleneckLSTMCell(nn.Module):
""" Creates a LSTM layer cell
Arguments:
input_channels : variable used to contain value of number of channels in input
hidden_channels : variable used to contain value of number of channels in the hidden state of LSTM cell
"""
def __init__(self, input_channels, hidden_channels):
super(BottleneckLSTMCell, self).__init__()
assert hidden_channels % 2 == 0
self.input_channels = int(input_channels)
self.hidden_channels = int(hidden_channels)
self.num_features = 4
self.W = nn.Conv2d(in_channels=self.input_channels, out_channels=self.input_channels, kernel_size=3,
groups=self.input_channels, stride=1, padding=1)
self.Wy = nn.Conv2d(int(self.input_channels + self.hidden_channels), self.hidden_channels, kernel_size=1)
self.Wi = nn.Conv2d(self.hidden_channels, self.hidden_channels, 3, 1, 1, groups=self.hidden_channels,
bias=False)
self.Wbi = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbf = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbc = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbo = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.relu = nn.ReLU6()
logging.info("Initializing weights of lstm")
self._initialize_weights()
def _initialize_weights(self):
"""
Returns:
initialized weights of the model
"""
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def forward(self, x, h,
c): # implemented as mentioned in paper here the only difference is Wbi, Wbf, Wbc & Wbo are commuted all together in paper
"""
Arguments:
x : input tensor
h : hidden state tensor
c : cell state tensor
Returns:
output tensor after LSTM cell
"""
print('The size of x is', x.size())
print('The size of h is', h.size())
print('The size of c is', c.size())
x = self.W(x)
y = torch.cat((x, h), 1) # concatenate input and hidden layers
i = self.Wy(y) # reduce to hidden layer size,the bottleneck
b = self.Wi(i) # depth wise 3*3 need a pointwise
ci = torch.sigmoid(self.Wbi(b))
cf = torch.sigmoid(self.Wbf(b))
print('The device of cf is',cf.device)
#print('The device of c is',cc.device)
print('The device of ci is',ci.device)
print('The device of b is',b.device)
print('The device of x is',x.device)
print('The device of y is',y.device)
print('The device of i is',i.device)
print('The device of h is',h.device)
print('The device of c is',c.device)
cc = cf * c + ci * self.relu(self.Wbc(b))
co = torch.sigmoid(self.Wbo(b))
ch = co * self.relu(cc)
# print('Wci is ',self.Wci)
# print('Wcf is ', self.Wcf)
# print('Wco is ', self.Wco)
return ch, cc
def init_hidden(self, batch_size, hidden, shape):
"""
Arguments:
batch_size : an int variable having value of batch size while training
hidden : an int variable having value of number of channels in hidden state
shape : an array containing shape of the hidden and cell state
Returns:
cell state and hidden state
"""
return (Variable(torch.zeros(batch_size, hidden, shape[0], shape[1])).cuda(),
Variable(torch.zeros(batch_size, hidden, shape[0], shape[1])).cuda()
)
class BottleneckLSTM(nn.Module):
def __init__(self, input_channels, hidden_channels, height, width, batch_size):
""" Creates Bottleneck LSTM layer
Arguments:
input_channels : variable having value of number of channels of input to this layer
hidden_channels : variable having value of number of channels of hidden state of this layer
height : an int variable having value of height of the input
width : an int variable having value of width of the input
batch_size : an int variable having value of batch_size of the input
Returns:
Output tensor of LSTM layer
"""
super(BottleneckLSTM, self).__init__()
self.input_channels = int(input_channels)
self.hidden_channels = int(hidden_channels)
self.cell = BottleneckLSTMCell(self.input_channels, self.hidden_channels)
(h, c) = self.cell.init_hidden(batch_size, hidden=self.hidden_channels, shape=(height, width))
self.hidden_state = h
self.cell_state = c
def forward(self, input):
new_h, new_c = self.cell(input, self.hidden_state, self.cell_state)
self.hidden_state = new_h
self.cell_state = new_c
return self.hidden_state
'''
class BottleneckLSTM(nn.Module):
def __init__(self, input_channels, hidden_channels, height, width, batch_size):
""" Creates Bottleneck LSTM layer
Arguments:
input_channels : variable having value of number of channels of input to this layer
hidden_channels : variable having value of number of channels of hidden state of this layer
height : an int variable having value of height of the input
width : an int variable having value of width of the input
batch_size : an int variable having value of batch_size of the input
Returns:
Output tensor of LSTM layer
"""
super(BottleneckLSTM, self).__init__()
self.input_channels = int(input_channels)
self.hidden_channels = int(hidden_channels)
self.batch_size = batch_size
self.shape = (height,width)
self.num_features = 4
self.W = nn.Conv2d(in_channels=self.input_channels, out_channels=self.input_channels, kernel_size=3,
groups=self.input_channels, stride=1, padding=1)
self.Wy = nn.Conv2d(int(self.input_channels + self.hidden_channels), self.hidden_channels, kernel_size=1)
self.Wi = nn.Conv2d(self.hidden_channels, self.hidden_channels, 3, 1, 1, groups=self.hidden_channels,
bias=False)
self.Wbi = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbf = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbc = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.Wbo = nn.Conv2d(self.hidden_channels, self.hidden_channels, 1, 1, 0, bias=False)
self.relu = nn.ReLU6()
logging.info("Initializing weights of lstm")
self._initialize_weights()
#self.cell = self.int_BottleneckLSTMCell(self.input_channels, self.hidden_channels)
#(h, c) = self.cell.init_hidden(batch_size, hidden=self.hidden_channels, shape=(height, width))
#self.hidden_state = h
#self.cell_state = c
def forward(self, x, h,
c): # implemented as mentioned in paper here the only difference is Wbi, Wbf, Wbc & Wbo are commuted all together in paper
"""
Arguments:
x : input tensor
h : hidden state tensor
c : cell state tensor
Returns:
output tensor after LSTM cell
"""
# print('The size of x is', x.size())
# print('The size of h is', h.size())
# print('The size of c is', c.size())
x = self.W(x)
y = torch.cat((x, h), 1) # concatenate input and hidden layers
i = self.Wy(y) # reduce to hidden layer size,the bottleneck
b = self.Wi(i) # depth wise 3*3 need a pointwise
ci = torch.sigmoid(self.Wbi(b))
cf = torch.sigmoid(self.Wbf(b))
cc = cf * c + ci * self.relu(self.Wbc(b))
co = torch.sigmoid(self.Wbo(b))
ch = co * self.relu(cc)
# print('Wci is ',self.Wci)
# print('Wcf is ', self.Wcf)
# print('Wco is ', self.Wco)
return ch ,ch ,cc
def _initialize_weights(self):
"""
Returns:
initialized weights of the model
"""
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def init_hidden(self):
"""
Arguments:
batch_size : an int variable having value of batch size while training
hidden : an int variable having value of number of channels in hidden state
shape : an array containing shape of the hidden and cell state
Returns:
cell state and hidden state
"""
return (Variable(torch.zeros(self.batch_size, self.hidden_channels, self.shape[0], self.shape[1])).cuda(),
Variable(torch.zeros(self.batch_size, self.hidden_channels, self.shape[0], self.shape[1])).cuda()
)
def crop_like(x, target):
"""
Arguments:
x : a tensor whose shape has to be cropped
target : a tensor whose shape has to assert on x
Returns:
x having same shape as target
"""
if x.size()[2:] == target.size()[2:]:
return x
else:
height = target.size()[2]
width = target.size()[3]
crop_h = torch.FloatTensor([x.size()[2]]).sub(height).div(-2)
crop_w = torch.FloatTensor([x.size()[3]]).sub(width).div(-2)
# fixed indexing for PyTorch 0.4
return F.pad(x, [int(crop_w.ceil()[0]), int(crop_w.floor()[0]), int(crop_h.ceil()[0]), int(crop_h.floor()[0])])
class MobileNetV1(nn.Module):
def __init__(self, num_classes=1024, alpha=1):
"""torch.nn.module for mobilenetv1 upto conv12
Arguments:
num_classes : an int variable having value of total number of classes
alpha : a float used as width multiplier for channels of model
"""
super(MobileNetV1, self).__init__()
# upto conv 12
self.model = nn.Sequential(
conv_bn(3, 32 * alpha, 2),
conv_dw(32 * alpha, 64 * alpha, 1),
conv_dw(64 * alpha, 128 * alpha, 2),
conv_dw(128 * alpha, 128 * alpha, 1),
conv_dw(128 * alpha, 256 * alpha, 2),
conv_dw(256 * alpha, 256 * alpha, 1),
conv_dw(256 * alpha, 512 * alpha, 2),
conv_dw(512 * alpha, 512 * alpha, 1),
conv_dw(512 * alpha, 512 * alpha, 1),
conv_dw(512 * alpha, 512 * alpha, 1),
conv_dw(512 * alpha, 512 * alpha, 1),
conv_dw(512 * alpha, 512 * alpha, 1),
)
logging.info("Initializing weights of base net")
self._initialize_weights()
# self.fc = nn.Linear(1024, num_classes)
def _initialize_weights(self):
"""
Returns:
initialized weights of the model
"""
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def forward(self, x):
"""
Arguments:
x : a tensor which is used as input for the model
Returns:
a tensor which is output of the model
"""
x = self.model(x)
return x
class SSD(nn.Module):
def __init__(self, num_classes, batch_size, alpha=1, is_test=False, config=None, device=None):
"""
Arguments:
num_classes : an int variable having value of total number of classes
batch_size : an int variable having value of batch size
alpha : a float used as width multiplier for channels of model
is_Test : a bool used to make model ready for testing
config : a dict containing all the configuration parameters
"""
super(SSD, self).__init__()
# Decoder
self.is_test = is_test
self.config = config
self.num_classes = num_classes
if device:
self.device = device
else:
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
if is_test:
self.config = config
self.priors = config.priors.to(self.device)
self.conv13 = conv_dw(512 * alpha, 1024 * alpha, 2) # not using conv14 as mentioned in paper
self.bottleneck_lstm1 = BottleneckLSTM(input_channels=1024 * alpha, hidden_channels=256 * alpha, height=10,
width=10, batch_size=batch_size)
self.fmaps_1 = nn.Sequential(
nn.Conv2d(in_channels=int(256 * alpha), out_channels=int(128 * alpha), kernel_size=1),
nn.ReLU6(inplace=True),
SeperableConv2d(in_channels=128 * alpha, out_channels=256 * alpha, kernel_size=3, stride=2, padding=1),
)
self.fmaps_2 = nn.Sequential(
nn.Conv2d(in_channels=int(256 * alpha), out_channels=int(64 * alpha), kernel_size=1),
nn.ReLU6(inplace=True),
SeperableConv2d(in_channels=64 * alpha, out_channels=128 * alpha, kernel_size=3, stride=2, padding=1),
)
self.fmaps_3 = nn.Sequential(
nn.Conv2d(in_channels=int(128 * alpha), out_channels=int(64 * alpha), kernel_size=1),
nn.ReLU6(inplace=True),
SeperableConv2d(in_channels=64 * alpha, out_channels=128 * alpha, kernel_size=3, stride=2, padding=1),
)
self.fmaps_4 = nn.Sequential(
nn.Conv2d(in_channels=int(128 * alpha), out_channels=int(32 * alpha), kernel_size=1),
nn.ReLU6(inplace=True),
SeperableConv2d(in_channels=32 * alpha, out_channels=64 * alpha, kernel_size=3, stride=2, padding=1),
)
self.regression_headers = nn.ModuleList([
SeperableConv2d(in_channels=512 * alpha, out_channels=6 * 4, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256 * alpha, out_channels=6 * 4, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256 * alpha, out_channels=6 * 4, kernel_size=3, padding=1),
SeperableConv2d(in_channels=128 * alpha, out_channels=6 * 4, kernel_size=3, padding=1),
SeperableConv2d(in_channels=128 * alpha, out_channels=6 * 4, kernel_size=3, padding=1),
nn.Conv2d(in_channels=int(64 * alpha), out_channels=6 * 4, kernel_size=1),
])
self.classification_headers = nn.ModuleList([
SeperableConv2d(in_channels=512 * alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256 * alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256 * alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=128 * alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=128 * alpha, out_channels=6 * num_classes, kernel_size=3, padding=1),
nn.Conv2d(in_channels=int(64 * alpha), out_channels=6 * num_classes, kernel_size=1),
])
logging.info("Initializing weights of SSD")
self._initialize_weights()
def _initialize_weights(self):
"""
Returns:
initialized weights of the model
"""
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight)
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def compute_header(self, i, x): # ssd method to calculate headers
"""
Arguments:
i : an int used to use particular classification and regression layer
x : a tensor used as input to layers
Returns:
locations and confidences of the predictions
"""
confidence = self.classification_headers[i](x)
confidence = confidence.permute(0, 2, 3, 1).contiguous()
confidence = confidence.view(confidence.size(0), -1, self.num_classes)
location = self.regression_headers[i](x)
location = location.permute(0, 2, 3, 1).contiguous()
location = location.view(location.size(0), -1, 4)
return confidence, location
def forward(self, x,h,c):
"""
Arguments:
x : a tensor which is used as input for the model
Returns:
confidences and locations of predictions made by model during training
or
confidences and boxes of predictions made by model during testing
"""
confidences = []
locations = []
header_index = 0
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.conv13(x)
#x = self.bottleneck_lstm1(x)
#h, c = self.bottleneck_lstm1.init_hidden()
x,h,c = self.bottleneck_lstm1(x,h,c)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.fmaps_1(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.fmaps_2(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.fmaps_3(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
x = self.fmaps_4(x)
confidence, location = self.compute_header(header_index, x)
header_index += 1
confidences.append(confidence)
locations.append(location)
confidences = torch.cat(confidences, 1)
locations = torch.cat(locations, 1)
if self.is_test: # while testing convert locations to boxes
confidences = F.softmax(confidences, dim=2)
boxes = box_utils.convert_locations_to_boxes(
locations, self.priors, self.config.center_variance, self.config.size_variance
)
boxes = box_utils.center_form_to_corner_form(boxes)
return confidences, boxes,h,c
else:
return confidences, locations,h,c
class MobileVOD(nn.Module):
"""
Module to join encoder and decoder of predictor model
"""
def __init__(self, pred_enc, pred_dec):
"""
Arguments:
pred_enc : an object of MobilenetV1 class
pred_dec : an object of SSD class
"""
super(MobileVOD, self).__init__()
self.pred_encoder = pred_enc
self.pred_decoder = pred_dec
def forward(self, seq,h,c):
"""
Arguments:
seq : a tensor used as input to the model
Returns:
confidences and locations of predictions made by model
"""
x = self.pred_encoder(seq)
confidences, locations ,h,c = self.pred_decoder(x,h,c)
return confidences, locations,h,c
def detach_hidden(self,h,c):
"""
Detaches hidden state and cell state of all the LSTM layers from the graph
"""
h.detach_()
c.detach_()
'''train_mvod_lstm1_multigpu.py'''
#!/usr/bin/python3
"""Script for training the MobileVOD with 1 Bottleneck Bottleneck LSTM layers. As in mobilenet, here we use depthwise seperable convolutions
for reducing the computation without affecting accuracy much. Model is trained on Imagenet VID 2015 dataset.
Here we unroll LSTM for 10 steps and gives 10 consecutive frames of video as input.
Few global variables defined here are explained:
Global Variables
----------------
args : dict
Has all the options for changing various variables of the model as well as hyper-parameters for training.
dataset : VIDDataset (torch.utils.data.Dataset, For more info see datasets/vid_dataset.py)
optimizer : optim.RMSprop
scheduler : CosineAnnealingLR, MultiStepLR (torch.optim.lr_scheduler)
config : mobilenetv1_ssd_config (See config/mobilenetv1_ssd_config.py for more info, where you can change input size and ssd priors)
loss : MultiboxLoss (See network/multibox_loss.py for more info)
how to run: python train_mvod_lstm1_multigpu.py --datasets /home/ILSVRC2015 --cache_path=../cache --batch_size 10 --num_epochs 30 --pretrained ./models/basenet/WM-1.0-Epoch-3-Loss-5.234554548599229.pth --width_mult 1 --freeze_net
"""
import argparse
import os
import logging
import sys
import itertools
import torch
from torch.utils.data import DataLoader, ConcatDataset
from torch.optim.lr_scheduler import CosineAnnealingLR, MultiStepLR
from utils.misc import str2bool, Timer, store_labels
from network.mvod_bottleneck_lstm1_multigpu import MobileVOD, SSD, MobileNetV1, MatchPrior
from datasets.vid_dataset_new import VIDDataset
from network.multibox_loss import MultiboxLoss
from config import mobilenetv1_ssd_config
from dataloaders.data_preprocessing import TrainAugmentation, TestTransform
parser = argparse.ArgumentParser(
description='Mobile Video Object Detection (Bottleneck LSTM) Training With Pytorch')
parser.add_argument('--datasets', help='Dataset directory path')
parser.add_argument('--cache_path', help='Cache directory path')
parser.add_argument('--freeze_net', action='store_true',
help="Freeze all the layers except the prediction head.")
parser.add_argument('--width_mult', default=1.0, type=float,
help='Width Multiplifier')
# Params for SGD
parser.add_argument('--lr', '--learning-rate', default=0.0003, type=float,
help='initial learning rate')
parser.add_argument('--momentum', default=0.9, type=float,
help='Momentum value for optim')
parser.add_argument('--weight_decay', default=5e-4, type=float,
help='Weight decay for SGD')
parser.add_argument('--gamma', default=0.1, type=float,
help='Gamma update for SGD')
parser.add_argument('--base_net_lr', default=None, type=float,
help='initial learning rate for base net.')
parser.add_argument('--ssd_lr', default=None, type=float,
help='initial learning rate for the layers not in base net and prediction heads.')
# Params for loading pretrained basenet or checkpoints.
parser.add_argument('--pretrained', help='Pre-trained model')
parser.add_argument('--resume', default=None, type=str,
help='Checkpoint state_dict file to resume training from')
# Scheduler
parser.add_argument('--scheduler', default="multi-step", type=str,
help="Scheduler for SGD. It can one of multi-step and cosine")
# Params for Multi-step Scheduler
parser.add_argument('--milestones', default="80,100", type=str,
help="milestones for MultiStepLR")
# Params for Cosine Annealing
parser.add_argument('--t_max', default=120, type=float,
help='T_max value for Cosine Annealing Scheduler.')
# Train params
parser.add_argument('--batch_size', default=1, type=int,
help='Batch size for training')
parser.add_argument('--num_epochs', default=200, type=int,
help='the number epochs')
parser.add_argument('--num_workers', default=4, type=int,
help='Number of workers used in dataloading')
parser.add_argument('--validation_epochs', default=1, type=int,
help='the number epochs')
parser.add_argument('--debug_steps', default=100, type=int,
help='Set the debug log output frequency.')
parser.add_argument('--sequence_length', default=10, type=int,
help='sequence_length of video to unfold')
parser.add_argument('--use_cuda', default=True, type=str2bool,
help='Use CUDA to train model')
parser.add_argument('--checkpoint_folder', default='models/',
help='Directory for saving checkpoint models')
logging.basicConfig(stream=sys.stdout, level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
args = parser.parse_args()
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() and args.use_cuda else "cpu")
if args.use_cuda and torch.cuda.is_available():
torch.backends.cudnn.benchmark = True
logging.info("Use Cuda.")
def train(loader, net, criterion, optimizer, device, hidden_state,cell_state,
debug_steps=100, epoch=-1, sequence_length=10,
):
""" Train model
Arguments:
net : object of MobileVOD class
loader : validation data loader object
criterion : Loss function to use
device : device on which computation is done
optimizer : optimizer to optimize model
debug_steps : number of steps after which model needs to debug
sequence_length : unroll length of model
epoch : current epoch number
"""
net.train(True)
running_loss = 0.0
running_regression_loss = 0.0
running_classification_loss = 0.0
for i, data in enumerate(loader):
images, boxes, labels = data
for image, box, label in zip(images, boxes, labels):
image = image.to(device)
box = box.to(device)
label = label.to(device)
optimizer.zero_grad()
confidence, locations,h,c = net(image,hidden_state,cell_state)
regression_loss, classification_loss = criterion(confidence, locations, label, box) # TODO CHANGE BOXES
loss = regression_loss + classification_loss
loss.backward(retain_graph=True)
optimizer.step()
running_loss += loss.item()
running_regression_loss += regression_loss.item()
running_classification_loss += classification_loss.item()
hidden_state = h
cell_state = c
net.detach_hidden(hidden_state,cell_state)
if i and i % debug_steps == 0:
avg_loss = running_loss / (debug_steps * sequence_length)
avg_reg_loss = running_regression_loss / (debug_steps * sequence_length)
avg_clf_loss = running_classification_loss / (debug_steps * sequence_length)
logging.info(
f"Epoch: {epoch}, Step: {i}, " +
f"Average Loss: {avg_loss:.4f}, " +
f"Average Regression Loss {avg_reg_loss:.4f}, " +
f"Average Classification Loss: {avg_clf_loss:.4f}"
)
running_loss = 0.0
running_regression_loss = 0.0
running_classification_loss = 0.0
net.detach_hidden()
def val(loader, net, criterion, device):
""" Validate model
Arguments:
net : object of MobileVOD class
loader : validation data loader object
criterion : Loss function to use
device : device on which computation is done
Returns:
loss, regression loss, classification loss
"""
net.eval()
running_loss = 0.0
running_regression_loss = 0.0
running_classification_loss = 0.0
num = 0
for _, data in enumerate(loader):
images, boxes, labels = data
for image, box, label in zip(images, boxes, labels):
image = image.to(device)
box = box.to(device)
label = label.to(device)
num += 1
with torch.no_grad():
confidence, locations = net(image)
regression_loss, classification_loss = criterion(confidence, locations, label, box)
loss = regression_loss + classification_loss
running_loss += loss.item()
running_regression_loss += regression_loss.item()
running_classification_loss += classification_loss.item()
net.detach_hidden()
return running_loss / num, running_regression_loss / num, running_classification_loss / num
def initialize_model(net):
""" Loads learned weights from pretrained checkpoint model
Arguments:
net : object of MobileVOD
"""
if args.pretrained:
logging.info("Loading weights from pretrained netwok")
pretrained_net_dict = torch.load(args.pretrained)
model_dict = net.state_dict()
# 1. filter out unnecessary keys
pretrained_dict = {k: v for k, v in pretrained_net_dict.items() if
k in model_dict and model_dict[k].shape == pretrained_net_dict[k].shape}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict)
net.load_state_dict(model_dict)
if __name__ == '__main__':
timer = Timer()
logging.info(args)
config = mobilenetv1_ssd_config # config file for priors etc.
train_transform = TrainAugmentation(config.image_size, config.image_mean, config.image_std)
target_transform = MatchPrior(config.priors, config.center_variance,
config.size_variance, 0.5)
test_transform = TestTransform(config.image_size, config.image_mean, config.image_std)
logging.info("Prepare training datasets.")
train_dataset = VIDDataset(args.datasets, args.cache_path, transform=train_transform,
target_transform=target_transform, batch_size=args.batch_size)
label_file = os.path.join("models/", "vid-model-labels.txt")
store_labels(label_file, train_dataset._classes_names)
num_classes = len(train_dataset._classes_names)
logging.info(f"Stored labels into file {label_file}.")
logging.info("Train dataset size: {}".format(len(train_dataset)))
train_loader = DataLoader(train_dataset, args.batch_size,
num_workers=args.num_workers,
shuffle=True)
# logging.info("Prepare Validation datasets.")
# val_dataset = VIDDataset(args.datasets, args.cache_path, transform=test_transform,
# target_transform=target_transform, is_val=True)
# logging.info(val_dataset)
# logging.info("validation dataset size: {}".format(len(val_dataset)))
# val_loader = DataLoader(val_dataset, args.batch_size,
# num_workers=args.num_workers,
# shuffle=False)
# num_classes = 30
logging.info("Build network.")
pred_enc = MobileNetV1(num_classes=num_classes, alpha=args.width_mult)
pred_dec = SSD(num_classes=num_classes, batch_size=args.batch_size, alpha=args.width_mult, is_test=False)
if args.resume is None:
net = MobileVOD(pred_enc, pred_dec)
initialize_model(net)
else:
net = MobileVOD(pred_enc, pred_dec)
print("Updating weights from resume model")
net.load_state_dict(
torch.load(args.resume,
map_location=lambda storage, loc: storage))
min_loss = -10000.0
last_epoch = -1
base_net_lr = args.base_net_lr if args.base_net_lr is not None else args.lr
ssd_lr = args.ssd_lr if args.ssd_lr is not None else args.lr
# multi-GPU
if args.freeze_net:
logging.info("Freeze net.")
for param in pred_enc.parameters():
param.requires_grad = False
net.pred_decoder.conv13.requires_grad = False
criterion = MultiboxLoss(config.priors, iou_threshold=0.5, neg_pos_ratio=10,
center_variance=0.1, size_variance=0.2, device=DEVICE)
optimizer = torch.optim.RMSprop(
[{'params': [param for name, param in net.pred_encoder.named_parameters()], 'lr': base_net_lr},
{'params': [param for name, param in net.pred_decoder.named_parameters()], 'lr': ssd_lr}, ], lr=args.lr,
weight_decay=args.weight_decay, momentum=args.momentum)
logging.info(f"Learning rate: {args.lr}, Base net learning rate: {base_net_lr}, "
+ f"Extra Layers learning rate: {ssd_lr}.")
# if args.scheduler == 'multi-step':
# logging.info("Uses MultiStepLR scheduler.")
# milestones = [int(v.strip()) for v in args.milestones.split(",")]
# scheduler = MultiStepLR(optimizer, milestones=milestones,
# gamma=0.1, last_epoch=last_epoch)
# elif args.scheduler == 'cosine':
# logging.info("Uses CosineAnnealingLR scheduler.")
# scheduler = CosineAnnealingLR(optimizer, args.t_max, last_epoch=last_epoch)
# else:
# logging.fatal(f"Unsupported Scheduler: {args.scheduler}.")
# parser.print_help(sys.stderr)
# sys.exit(1)
#net = torch.nn.DataParallel(net, device_ids=[0, 1, 2, 3]).cuda()
net.to(DEVICE)
output_path = os.path.join(args.checkpoint_folder, f"lstm1_multigpu")
if not os.path.exists(output_path):
os.makedirs(os.path.join(output_path))
logging.info(f"Start training from epoch {last_epoch + 1}.")
for epoch in range(last_epoch + 1, args.num_epochs):
# scheduler.step()
h,c = net.pred_decoder.bottleneck_lstm1.init_hidden()
train(train_loader, net, criterion, optimizer,
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch, sequence_length=args.sequence_length,
hidden_state = h,cell_state =c)
if epoch % args.validation_epochs == 0 or epoch == args.num_epochs - 1:
# val_loss, val_regression_loss, val_classification_loss = val(val_loader, net, criterion, DEVICE)
# logging.info(
# f"Epoch: {epoch}, " +
# f"Validation Loss: {val_loss:.4f}, " +
# f"Validation Regression Loss {val_regression_loss:.4f}, " +
# f"Validation Classification Loss: {val_classification_loss:.4f}"
# )
model_path = os.path.join(output_path, f"WM-{args.width_mult}-Epoch-{epoch}.pth")
torch.save(net.state_dict(), model_path)
logging.info(f"Saved model {model_path}")
Hi, @Mindbooom, @vikrant7, have you trained the basenet? I am wondering whether the models provided in the repo was totally trained. And, can your multi-gpu code work now?
@samanthawyf I haven't trained the basenet, just applying the basenet of epoch3 provided by @vikrant7 .And the mAP got from evaluate.py is 43% which is 10% less than the paper. I may try to train the basenet after I know how to use multi-GPU efficiently. Now the multi-GPU code commented above can work but with a 1/8 training speed compared with one GPU. I don't know why because when running on one gpu, the code have a equivalent speed with the raw code. If you are interested in it, please try this code on your own multi-GPUs and help us to improve the speed.
Hi @Mindbooom, Thanks for sharing the multi-GPU scripts. Now I am free to actively work on this project.
@vikrant7 Bro, I have a question about the evaluate part. Running evaluate.py with the basenet of epoch2, I got a mAP of 43%. But there are 2 strange results. Firstly, when evaluating the lstm1 of epoch2 uploaded, the mAP is almost 0.
Secondly, after the poor result, I trained a lstm1 of epoch0 using the basenet uploaded, the mAP is 21%. Seems that adding a lstm1 weaken the training result of basenet. Can you help me to solve the questions? Thank you very much. P.S. the multi GPU script above leads to a Dataloader error and I'm trying to fix it.
@Mindbooom Got almost the same results after training Lstm1 with the basenet provided, do you know why we got those results?
Hi,I notice that in the paper the forward part is this But in your code, this part is Why you plus c*self.wci and self.wcf in the code putting the ct-1 into the functions? And you involve the cc into the calculation of co which is also different from the paper. What is that meaning? Thank you very much!