How does the net learn temporal information since shuffle is True in Dataloader?

Mindbooom commented 4 years ago

Hi @jszgz ,I noticed this problem too. So I provided a script here to create a new sequence txt which contains images belong to different videos shuffled as video level. When you train the model, please set 'shuffle' as false. Unfortunately , after using the new txt, I still cannot get a efficient model with a higher mAP than the basenet. If you are interested, please try to train a efficient model.

#!/usr/bin/python3
"""Script for creating text file containing sequences of 10 frames of particular video. Here we neglect all the frames where 
there is no object in it as it was done in the official implementation in tensorflow.
Global Variables
----------------
dirs : containing list of all the training dataset folders
dirs_val : containing path to val folder of dataset
dirs_test : containing path to test folder of dataset

create a sequence list contains images belong to different videos shuffleed as video level
"""
import numpy as np
import logging
import pathlib
import xml.etree.ElementTree as ET
import cv2
import os

dirs = ['ILSVRC2015_VID_train_0000/',
        'ILSVRC2015_VID_train_0001/',
        'ILSVRC2015_VID_train_0002/',
        'ILSVRC2015_VID_train_0003/']
# Your path
dirs_val = ['../../../ILSVRC2015/Data/VID/val/']
dirs_test = ['../../../ILSVRC2015/Data/VID/test/']
dataset_path = '../../../ILSVRC2015/'

file_write_obj = open('train_VID_seqs_list_shuffle.txt','w')
seqs = []
for dir in dirs:
    seq = os.listdir(os.path.join(dataset_path,'Data/VID/train/',dir))
    for item in seq:
        seqs.append(os.path.join(dir, item))

#index_del = np.random.choice(len(seqs),size=int(len(seqs)*0.9),replace=False)
#seqs = np.delete(seqs,index_del)
np.random.shuffle(seqs)
#print(seqs[0],seqs[1])
for seq in seqs:
    seq_path = os.path.join(dataset_path,'Data/VID/train/',seq)
    relative_path = seq
    image_list = np.sort(os.listdir(seq_path))
    count = 0
    filtered_image_list = []
    for image in image_list:
        image_id = image.split('.')[0]
        anno_file = image_id + '.xml'
        anno_path = os.path.join(dataset_path,'Annotations/VID/train/',seq,anno_file)
        objects = ET.parse(anno_path).findall("object")
        num_objs = len(objects)
        if num_objs == 0: # discarding images without object
            continue
        else:
            count = count + 1
            filtered_image_list.append(relative_path+'/'+image_id)
    for i in range(0,int(count/10)):
        seqs = ''
        for j in range(0,10):
            seqs = seqs + filtered_image_list[10*i + j] + ','
        seqs = seqs[:-1]
        file_write_obj.writelines(seqs)
        file_write_obj.write('\n')
file_write_obj.close()
'''
file_write_obj = open('val_VID_seqs_list_small.txt','w')
seq_list = []
with open('val_VID_list.txt') as f:
    for line in f:
        seq_list.append(line.rstrip())
for i in range(0,int(len(seq_list)/10)):
    #image_path = seq_list[10*i].split('/')[0]
    #seqs = image_path+'/'+':'
    seqs = ''
    for j in range(0,10):
        seqs = seqs + seq_list[10*i + j] + ','
    seqs = seqs[:-1] 
    file_write_obj.writelines(seqs)
    file_write_obj.write('\n')
file_write_obj.close()
file_write_obj = open('test_VID_seqs_list_small.txt','w')
for dir in dirs_test:
    seqs = np.sort(os.listdir(dir))
    for seq in seqs:
        seq_path = os.path.join(dir,seq)
        image_list = np.sort(os.listdir(seq_path))
        for image in image_list:
            file_write_obj.writelines(seq+image)
            file_write_obj.write('\n')
file_write_obj.close()'''

petinhoss7 commented 4 years ago

This is an improvement, but we need to take new random sequences at every epoch

jszgz commented 4 years ago

I think we should use a new a sample strategy which can sample temporally adjacent frames in order at random timestamp. If the sequence is shuffled, how can the net know the order of the motion, or contrary，this lead to a Robust net?

petinhoss7 commented 4 years ago

he did the right thing, but instead of using the batch size of 10 in the dataloader, I used 1 because we need 1 sequence of 10 frames, after that you need to modify the train function to convert the list sequence of images into tensors and that should work it out , I am doing the learning now and it seems to work fine for the moment.

petinhoss7 commented 4 years ago

we also have to keep shuffle = True so it chooses random sequences

vikrant7 / mobile-vod-bottleneck-lstm

How does the net learn temporal information since shuffle is True in Dataloader? #10