guotaowang / STVS

21 stars 4 forks source link

where is the makegri #4

Open clelouch opened 3 years ago

clelouch commented 3 years ago

Thanks for your code and paper! I noticed that the makegri is not available. Can i use the torchvision.utils.make_grid to replace the make_grid function?

guotaowang commented 3 years ago

makegri.py

import torch import math import cv2 irange = range

def make_grid(tensor, nrow=8, padding=2, normalize=True, range=None, scale_each=False, pad_value=0): """Make a grid of images.

Args:
    tensor (Tensor or list): 4D mini-batch Tensor of shape (B x C x H x W)
        or a list of images all of the same size.
    nrow (int, optional): Number of images displayed in each row of the grid.
        The Final grid size is (B / nrow, nrow). Default is 8.
    padding (int, optional): amount of padding. Default is 2.
    normalize (bool, optional): If True, shift the image to the range (0, 1),
        by subtracting the minimum and dividing by the maximum pixel value.
    range (tuple, optional): tuple (min, max) where min and max are numbers,
        then these numbers are used to normalize the image. By default, min and max
        are computed from the tensor.
    scale_each (bool, optional): If True, scale each image in the batch of
        images separately rather than the (min, max) over all images.
    pad_value (float, optional): Value for the padded pixels.

Example:
    See this notebook `here <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>`_

"""
if not (torch.is_tensor(tensor) or
        (isinstance(tensor, list) and all(torch.is_tensor(t) for t in tensor))):
    raise TypeError('tensor or list of tensors expected, got {}'.format(type(tensor)))

# if list of tensors, convert to a 4D mini-batch Tensor
if isinstance(tensor, list):
    tensor = torch.stack(tensor, dim=0)

if tensor.dim() == 2:  # single image H x W
    tensor = tensor.view(1, tensor.size(0), tensor.size(1))
if tensor.dim() == 3:  # single image
    if tensor.size(0) == 1:  # if single-channel, convert to 3-channel
        tensor = torch.cat((tensor, tensor, tensor), 0)
    tensor = tensor.view(1, tensor.size(0), tensor.size(1), tensor.size(2))

if tensor.dim() == 4 and tensor.size(1) == 1:  # single-channel images
    tensor = torch.cat((tensor, tensor, tensor), 1)

if normalize is True:
    tensor = tensor.clone()  # avoid modifying tensor in-place
    if range is not None:
        assert isinstance(range, tuple), \
            "range has to be a tuple (min, max) if specified. min and max are numbers"

    def norm_ip(img, min, max):
        img.clamp_(min=min, max=max)
        img.add_(-min).div_(max - min + 1e-5)

    def norm_range(t, range):
        if range is not None:
            norm_ip(t, range[0], range[1])
        else:
            norm_ip(t, float(t.min()), float(t.max()))

    if scale_each is True:
        for t in tensor:  # loop over mini-batch dimension
            norm_range(t, range)
    else:
        norm_range(tensor, range)

if tensor.size(0) == 1:
    return tensor.squeeze()

# make the mini-batch of images into a grid
nmaps = tensor.size(0)
xmaps = min(nrow, nmaps)
ymaps = int(math.ceil(float(nmaps) / xmaps))
height, width = int(tensor.size(2) + padding), int(tensor.size(3) + padding)
grid = tensor.new(3, height * ymaps + padding, width * xmaps + padding).fill_(pad_value)
k = 0
for y in irange(ymaps):
    for x in irange(xmaps):
        if k >= nmaps:
            break
        grid.narrow(1, y * height + padding, height - padding)\
            .narrow(2, x * width + padding, width - padding)\
            .copy_(tensor[k])
        k = k + 1
return grid
clelouch commented 3 years ago

Thanks for your reply! I have another problem to bother you. In the train.py, import the ImageFolder from the dataset.py. However, the getitem function of the ImageFolder class returns five items. ` def getitem(self, index): clip = self.imgs[index] img_clip = [] img_name = []

    i = 0
    for frame in clip:
        img_path, name = frame
        imgname = img_path.split('\\')
        img = Image.open(img_path).convert('RGB')
        width = img.size[0]
        height = img.size[1]
        i = i + 1
        if self.transform is not None:
            img = self.transform(img)

        img = img.view(1, img.size(0), img.size(1), img.size(2))
        img_clip.append(img)
        img_name.append(imgname[-1])
    img = torch.cat(img_clip, 0)

    return img, name, img_name, width, height`

Besides, the label is not returned. The data loading process in the train.py is inputs, labels = data. It seems that the dataset class is not compatible with the train.py.

guotaowang commented 3 years ago

I'm very sorry, because the code time is relatively long to upload, we re-implemented STVS based on the BBSNet and uploaded it to Baidu Cloud Disk. If you have any questions, please contact qduwgt@163.com. link:https://pan.baidu.com/s/1tneKPmyvmMBPyv_meZmeiQ code:3dqb

clelouch commented 3 years ago

Thanks you so much. I will try to train it from scratch later.

clelouch commented 3 years ago

@guotaowang Sorry for troubling you. After reading the code, I still have some questions.

  1. I notice that you provide the saliency results on DAVIS with 50 clips. However, since the DAVIS has been used for training, it may not be favorable to test on it. According to the paper, it seems that the DAVIS is split into a training set and a test set. Could you please tell me how you divide it.
  2. Did you use the DAVSOD val set for validation?
  3. In the new implemented STVS based on BBSNet, how long did it take to train the model? I notice that in the config.py, the default training epoch is set to 200. But I am not sure whether you use the default setting or not. According to the paper, the pretraining stage requires 33000 epochs and the finetuneing stage takes 8500 epochs, wihch is astonishing.
  4. According to the paper, the STVS takes 256x256 images/frames as inputs. However, the new implemented code use a default setting of 473x473. How much difference does it make? Looking forward to your reply. Many thanks.