zhoubolei / TRN-pytorch

Temporal Relation Networks
http://relation.csail.mit.edu/
Other
787 stars 190 forks source link

Hi @highway007, #75

Open ranjitswami opened 4 years ago

ranjitswami commented 4 years ago

Hi @highway007,

I have some clarifications below, Can you please help me with answers?

I tried with 5 classes (shoplifting,normal, stealing, robbery, burglary ), For training I have used 30 videos for shopping, 30 videos for shoplifting, 15 videos for stealing, 15 videos for robbery, 10 videos for burglary. So for my process is I'm using Google colab for training with 12.72 GB RAM I created csv for training,test,validation,labels, My csv files looks like this:

This is my label.csv Screenshot (215)

This is my train.csv Screenshot (216)

This is my test.csv Screenshot (217)

This is my validation.csv Screenshot (218)

My train_videofolder.txt file looks like this

00 132 0
01 180 0
02 135 0
03 168 0
04 197 0
05 399 0
06 111 0
07 248 0
08 213 0
09 153 0
10 248 0
11 399 1
12 231 1
13 491 1
14 333 1
15 390 1
16 326 1
..... etc

val_videofolder.txt

40 460 1
41 343 1
42 378 1
43 350 1
44 618 1
45 238 0
46 114 0
47 153 0
48 093 0
49 546 0
69 834 2
78 048 3
79 036 3
80 384 3
81 078 3
87 288 4

category.txt

shoplifting
normal
stealing
robbery
burglary

This is my training code

!python3 main.py something RGB \
                     --arch BNInception --num_segments 8 \
                     --consensus_type TRNmultiscale --batch-size 16

My training looks like this

storing name: TRN_something_RGB_BNInception_TRNmultiscale_segment8

    Initializing TSN with base model: BNInception.
    TSN Configurations:
        input_modality:     RGB
        num_segments:       8
        new_length:         1
        consensus_module:   TRNmultiscale
        dropout_ratio:      0.8
        img_feature_dim:    256

/content/drive/My Drive/TRN-pytorch/models.py:87: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
  normal(self.new_fc.weight, 0, std)
/content/drive/My Drive/TRN-pytorch/models.py:88: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  constant(self.new_fc.bias, 0)
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
video number:59
/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py:208: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  "please use transforms.Resize instead.")
video number:16
group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1
group: first_conv_bias has 1 params, lr_mult: 2, decay_mult: 0
group: normal_weight has 83 params, lr_mult: 1, decay_mult: 1
group: normal_bias has 83 params, lr_mult: 2, decay_mult: 0
group: BN scale/shift has 2 params, lr_mult: 1, decay_mult: 0
Freezing BatchNorm2D except the first one.
main.py:175: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  losses.update(loss.data[0], input.size(0))
main.py:176: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  top1.update(prec1[0], input.size(0))
main.py:177: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  top5.update(prec5[0], input.size(0))
main.py:186: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  total_norm = clip_grad_norm(model.parameters(), args.clip_gradient)
Epoch: [0][0/4], lr: 0.00100    Time 16.655 (16.655)    Data 4.173 (4.173)  Loss 1.6147 (1.6147)    Prec@1 50.000 (50.000)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Epoch: [1][0/4], lr: 0.00100    Time 5.917 (5.917)  Data 4.407 (4.407)  Loss 1.6116 (1.6116)    Prec@1 18.750 (18.750)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Epoch: [2][0/4], lr: 0.00100    Time 5.543 (5.543)  Data 4.102 (4.102)  Loss 1.4904 (1.4904)    Prec@1 25.000 (25.000)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Epoch: [3][0/4], lr: 0.00100    Time 6.334 (6.334)  Data 4.920 (4.920)  Loss 1.4325 (1.4325)    Prec@1 25.000 (25.000)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Epoch: [4][0/4], lr: 0.00100    Time 7.225 (7.225)  Data 5.824 (5.824)  Loss 1.4092 (1.4092)    Prec@1 31.250 (31.250)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
main.py:223: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  input_var = torch.autograd.Variable(input, volatile=True)
main.py:224: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  target_var = torch.autograd.Variable(target, volatile=True)
main.py:233: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  losses.update(loss.data[0], input.size(0))
main.py:234: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  top1.update(prec1[0], input.size(0))
main.py:235: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  top5.update(prec5[0], input.size(0))
Test: [0/1] Time 1.686 (1.686)  Loss 1.5454 (1.5454)    Prec@1 31.250 (31.250)  Prec@5 100.000 (100.000)
Testing Results: Prec@1 31.250 Prec@5 100.000 Loss 1.54541

Best Prec@1: 0.000
Freezing BatchNorm2D except the first one.
Epoch: [5][0/4], lr: 0.00100    Time 5.674 (5.674)  Data 4.189 (4.189)  Loss 1.4755 (1.4755)    Prec@1 25.000 (25.000)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Epoch: [6][0/4], lr: 0.00100    Time 4.719 (4.719)  Data 3.302 (3.302)  Loss 1.5275 (1.5275)    Prec@1 31.250 (31.250)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Epoch: [7][0/4], lr: 0.00100    Time 4.682 (4.682)  Data 3.281 (3.281)  Loss 1.3586 (1.3586)    Prec@1 31.250 (31.250)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Epoch: [8][0/4], lr: 0.00100    Time 6.715 (6.715)  Data 5.315 (5.315)  Loss 1.2957 (1.2957)    Prec@1 43.750 (43.750)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Epoch: [9][0/4], lr: 0.00100    Time 3.891 (3.891)  Data 2.501 (2.501)  Loss 1.2222 (1.2222)    Prec@1 43.750 (43.750)  Prec@5 100.000 (100.000)
Freezing BatchNorm2D except the first one.
Test: [0/1] Time 1.638 (1.638)  Loss 1.5057 (1.5057)    Prec@1 31.250 (31.250)  Prec@5 100.000 (100.000)
Testing Results: Prec@1 31.250 Prec@5 100.000 Loss 1.50569

Best Prec@1: 31.250
Freezing BatchNorm2D except the first one.

After completing my training, I'm getting same result for every input video (accuracy, labels are always same). This is the result I got for each and every input video

0.328 -> normal
0.324 -> shoplifting
0.161 -> stealing
0.148 -> robbery
0.039 -> burglary

I have some clarifications 1) am I following right process? 2) In training epoch what is the meaning of Epoch: [5][0/4] also In my training [0/4] not increasing till the end. But in your training, I see the following

Epoch: [993][0/64], lr: 0.00100 Time 3.399 (3.399)  Data 3.113 (3.113)  Loss 1.8708 (1.8708)    Prec@1 25.000 (25.000)  Prec@5 100.000 (100.000)
Epoch: [993][20/64], lr: 0.00100    Time 0.179 (0.336)  Data 0.000 (0.148)  Loss 2.1559 (1.9719)    Prec@1 12.500 (12.500)  Prec@5 37.500 (72.619)
Epoch: [993][40/64], lr: 0.00100    Time 0.179 (0.260)  Data 0.000 (0.076)  Loss 2.0889 (1.9944)    Prec@1 0.000 (13.110)   Prec@5 50.000 (68.902)

Also myPrec@5 is always Prec@5 100.000 (100.000)

Is this because of I'm using colab? for training ??, the reason to ask is, the colab training stops in 119 steps(close to an hour training only), I suspect this is the issue, since I couldn't continue the training for more than hour, Do I have any place in the code to configure the training time?

Do I need to use 1080 TI or AWS for continuous training of atleast 12 hours?

Originally posted by @Malathi15 in https://github.com/metalbubble/TRN-pytorch/issues/46#issuecomment-477472629

ranjitswami commented 4 years ago

Hello Brother, I am facing same issue getting same results for every test sample can you please help me.

withinnoitatpmet commented 4 years ago

You only have 5 classes, Prec@5 is always 100% is not a surprise at all.