yjxiong / tsn-pytorch

Temporal Segment Networks (TSN) in PyTorch
BSD 2-Clause "Simplified" License
1.07k stars 308 forks source link

[solved] RGBDiff - no progress #65

Closed Scitator closed 6 years ago

Scitator commented 6 years ago

Hi,

I am trying to reproduce your results, nevertheless I found, that current official RGBDiff implementations does no learn: image Python 3.6, Pytorch 0.3.1, cuda90.

The interesting part is, that the training loop looks correct, so....some model architecture bug?

yjxiong commented 6 years ago

Hi, I think your environment is correct and the provided implementation should work properly.

I noticed that you have training accuracies like 12.5. Are you training with a single GPU or have you reduced the Batchsize?

Scitator commented 6 years ago

@yjxiong yup, I use just single Titan V with ~20 (memory restriction)

Scitator commented 6 years ago

@yjxiong what are hardware requirements for correct RGBDiff model training? 4x1080ti?

yjxiong commented 6 years ago

I don't think 20 batch size will work. The 128 batchsize is necessary for convergence. Any GPU setting that gives your it will be ok.

Scitator commented 6 years ago

Second try with 2 Titan V (120 batch size with pretrained resnet18):

python main.py ucf101 RGBDiff \
    ./data/ucf101_rgb_train_split_1.txt ./data/ucf101_rgb_val_split_1.txt \
    --arch resnet18 --num_segments 5  --gd 40 --lr 0.001 --lr_steps 80 160 \
    --epochs 180    -b 128 -j 6 --dropout 0.8

image

yjxiong commented 6 years ago

Interesting. Can you first try training RGB or flow to see whether you can get the model converge? We haven’t met this problem before.

Scitator commented 6 years ago

@yjxiong from my experiments, RGB converges to ~75-80% accuracy for 1 split.

First few epochs training with

python main.py ucf101 RGB \
    ./data/ucf101_rgb_train_split_1.txt ./data/ucf101_rgb_val_split_1.txt \
    --arch resnet18 --num_segments 3 --gd 20 --lr 0.001 --lr_steps 30 60 \
    --epochs 80 -b 128 -j 6 --dropout 0.8

gives image

so, looks like RGB one works correctly.

yjxiong commented 6 years ago

I didn’t train resnet18 model so I don’t know whether your results are correct. If you run the BNinception model, the accuracy for RGB should quickly go to >80%.

Scitator commented 6 years ago

So, the results

python main.py ucf101 RGB <data>
    --arch BNInception --num_segments 3 \
    --gd 20 --lr 0.001 --lr_steps 30 60 --epochs 80 \
    -b 100 -j 8 --dropout 0.8 \
    --snapshot_pref ucf101_bninception_

image

Max ~78% accuracy, no 80% yet.

yjxiong commented 6 years ago

Did you use the tool we provided for extracting frames? If so you should get >80% results for RGB.

Scitator commented 6 years ago

yeap, I use https://github.com/yjxiong/temporal-segment-networks for video -> frames pipeline.

Still, what is the correct way to reproduce RGBDiff results?

yjxiong commented 6 years ago

The correct way, as I have mentioned, is to follow the instructions in README. Testing the RGB pipeline is to verify whether your pipeline is correct. Please first try to reproduce the RGB result by strictly following the provided command. Then we can help you with the RGBdiff.

Scitator commented 6 years ago

So, just to be sure, let me describe my full process of reproducing TSN results

Data preparation:

  1. goto https://github.com/yjxiong/temporal-segment-networks, clone recursive and download ucf101 data
  2. prepare frames for RGB/RGBDiff model with tools/build_of.py
  3. prepare train/valid lists with tools/build_file_list.py

Model preparation:

  1. Python 3.6 (anaconda), PyTorch 3.1, cuda90
  2. goto https://github.com/yjxiong/tsn-pytorch , clone recursive
  3. run one of previous scripts

Are there any problems with data preparation?

PS. Will try to reproduce this pipeline on other machine.

Scitator commented 6 years ago

Finally, I found an error during data preprocessing - somehow there is mistake with frames count per video. My hotfix (for dataset.py):

    def _parse_list(self):
        self.video_list = []
        for x in open(self.list_file):
            filepath, _, label = x.strip().split(' ')
            n_frames = len(os.listdir(filepath)) - 1
            self.video_list.append(VideoRecord([filepath, n_frames,label]))
yjxiong commented 6 years ago

I see. When you are generating the file list, please make sure the number of frames for each video is correct.

yjxiong commented 6 years ago

@Scitator Were you able to get the training converging?

Scitator commented 6 years ago

@yjxiong Finally, yes. ~87% accuracy with official code (both RGB and RGBDiff) and ~97% accuracy by my reimplementation with additional image augmentations and training tricks. Working on improvements still. :)

So, it converges.

yjxiong commented 6 years ago

Good to know. Just a heads up, on UCF101 it is not normal to get a 97% accuracy with RGB or RGBDiff w/o any pretraining. Maybe you should check whether you are using the average of per-class accuracy.