yuanyao366 / PRP

Apache License 2.0
40 stars 10 forks source link

Dataloading Error #3

Closed AKASH2907 closed 3 years ago

AKASH2907 commented 3 years ago

While training, during loading of batches, in the 3rd batch I'm facing this error. I tried to modify the random seed but it's stuck at 3rd batch specifically.

File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 77, in getitem videodata, sample_step_label = self.loadcvvideo_Finsert(index, recon_rate=recon_rate, sample_step=None) File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 179, in loadcvvideo_Finsert buffer, sample_step_label = self.loadcvvideo_Finsert(index, recon_rate, sample_step) File "/lustre/fs0/home/akumar/self/prp/datasets/predict_dataset.py", line 132, in loadcvvideo_Finsert sample_step_proposal = self.sample_retrieval[recon_rate] KeyError: 1

recon_rate is 1 I tried try and except to skip the video causing the error but then again new error pops up. The video path is correct but there's some error with the buffer length I'm facing. retaining:False buffer_len:106 sample_len:128

If I skip this video, then at this line https://github.com/yuanyao366/PRP/blob/58a301d92a540c915296de0d60a1cbaa304f0819/datasets/predict_dataset.py#L71

it's showing me NoneType object error.

yuanyao366 commented 3 years ago

The default value of recon_rate is 2, which is set in "self.recon_rate_list = [2]", line 50. There may be errors such as “retaining is False” line150 when using cv2 to read video. So if we encounter this error, we will print out relevant information and reload a video in line162,163.

AKASH2907 commented 3 years ago

After showing the pop up for retaining False buffer, it's shows the same error. cv2 is unable to open the video because I try to throw an exception if the capture is not opened. And it's stuck at that exception.

File "predict_dataset.py", line 79, in getitem videodata, sample_step_label = self.loadcvvideo_Finsert(index, recon_rate=recon_rate, sample_step=None) File "predict_dataset.py", line 201, in loadcvvideo_Finsert buffer, sample_step_label = self.loadcvvideo_Finsert(index, recon_rate, sample_step) File "predict_dataset.py", line 142, in loadcvvideo_Finsert sample_step_proposal = self.sample_retrieval[recon_rate] KeyError: 8

It throws this key error. This looks absurd but I am unable to figure out the problem. The index is changing correctly after that len(buffer)<sample_len. When the function is called again in itself it throws the error. in that while condition.

yuanyao366 commented 3 years ago

Did you change the value of recon_rate elsewhere in the code? The default value of recon_rate is 2, which is set in "self.recon_rate_list = [2]", line 50.

AKASH2907 commented 3 years ago

Yeah, I have seen it. I haven't changed that. This is the normal flow when I'm printing every parameter.

index: 3158 /datasets/UCF-101/TrainingData/GolfSwing/v_GolfSwing_g24_c02.avi frame_count: 158 sample_step_proposal [1, 2, 4, 8] proposal_idx: 3 sample_step 8 sample_step_label 3 index: 9531 /datasets/UCF-101/TrainingData/YoYo/v_YoYo_g24_c04.avi frame_count: 194 sample_step_proposal [1, 2, 4, 8] proposal_idx: 0 sample_step 1 sample_step_label 0

to check the error.this error is raised when len(buffer)<sample_len and a new index is loaded. the video file is not opened. fname index everything is correct. but not able to read the video file. path is correct. if capture.isOpened() : raise NameError('Just a Dummy Exception, write your own')

this is the index, recon_rate and sample_step respectively when len buffer is less than sample len, 9264 2 8 and just before the self.video loading function is called

yuanyao366 commented 3 years ago

This is the printout of the code running on my device. Reloading the video does not affect the whole pre-training process.

(pt0.4.1) yaoyuan@yaoyuan-XPS-8920:~/Workspace/PRP$ python train_predict.py --gpu 0 {'gpu': '0', 'epochs': 300, 'model_name': 'c3d', 'exp_name': 'default'} c3d module.fc8.weight module.fc8.bias 0%| | 0/300 [00:00<?, ?it/s]----------------------------------------------- conv_lr:0.01 fc8_lr:0.1 Epoch:[1][100/1092] data_time:0.013,batch time:0.705 loss:0.26339 loss_recon:0.11847 loss_class:1.44925 accuracy:31.750 conv_lr:0.01 fc8_lr:0.1 Epoch:[1][200/1092] data_time:0.011,batch time:0.678 loss:0.24187 loss_recon:0.10125 loss_class:1.40624 accuracy:34.125 conv_lr:0.01 fc8_lr:0.1 Epoch:[1][300/1092] data_time:0.017,batch time:0.690 loss:0.23022 loss_recon:0.09191 loss_class:1.38308 accuracy:33.750 retaining:False buffer_len:124 sample_len:128 reload conv_lr:0.01 fc8_lr:0.1 Epoch:[1][400/1092] data_time:0.011,batch time:0.685 loss:0.22153 loss_recon:0.08582 loss_class:1.35712 accuracy:35.344 conv_lr:0.01 fc8_lr:0.1 Epoch:[1][500/1092] data_time:0.015,batch time:0.693 loss:0.21508 loss_recon:0.08178 loss_class:1.33305 accuracy:36.775 retaining:False buffer_len:92 sample_len:128 reload retaining:False buffer_len:92 sample_len:128 reload conv_lr:0.01 fc8_lr:0.1 Epoch:[1][600/1092] data_time:0.016,batch time:0.698 loss:0.21115 loss_recon:0.07918 loss_class:1.31967 accuracy:37.208 conv_lr:0.01 fc8_lr:0.1 Epoch:[1][700/1092] data_time:0.012,batch time:0.725 loss:0.20754 loss_recon:0.07690 loss_class:1.30644 accuracy:37.964 retaining:False buffer_len:30 sample_len:32 reload conv_lr:0.01 fc8_lr:0.1 Epoch:[1][800/1092] data_time:0.013,batch time:0.689 loss:0.20472 loss_recon:0.07483 loss_class:1.29887 accuracy:38.156 conv_lr:0.01 fc8_lr:0.1 Epoch:[1][900/1092] data_time:0.013,batch time:0.689 loss:0.20150 loss_recon:0.07250 loss_class:1.29002 accuracy:38.403 conv_lr:0.01 fc8_lr:0.1 Epoch:[1][1000/1092] data_time:0.012,batch time:0.690 loss:0.19822 loss_recon:0.06998 loss_class:1.28238 accuracy:38.625 [TRAIN] loss_cls: 1.280, acc: 0.387 tensor([1233., 615., 491., 1046.]) tensor([2231., 2228., 2100., 2177.]) tensor([0.5527, 0.2760, 0.2338, 0.4805]) -----------------------------validation------------------- Epoch: [1][100/100] data_time:0.009,batch time:0.241 loss:0.14979 loss_recon:0.03542 loss_class:1.14373 accuracy:44.500 [VAL] loss_cls: 1.144, acc: 0.445 tensor([116., 56., 41., 143.]) tensor([228., 182., 197., 193.]) tensor([0.5088, 0.3077, 0.2081, 0.7409]) 0%|▌ | 1/300 [13:35<67:43:57, 815.51s/it]

AKASH2907 commented 3 years ago

Yeah, it ran. I added a parameter to check train vs test mode in the loadcvv function. That was causing the error. Thanks.