Some errors in running PCTDM

NaifahNurya commented 5 years ago

I got the following error, when it finish step 0, through running GAR.py

Please wait for tracking! about 240min for VD the person imgs are saved at ./dataset/VD\imgs trainval_videos: [0, 1, 2, 3, 6, 7, 8, 10, 12, 13, 15, 16, 17, 18, 19, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 36, 38, 39, 40, 41, 42, 46, 48, 49, 50, 51, 52, 53, 54] test_videos: [4, 5, 9, 11, 14, 20, 21, 25, 29, 34, 35, 37, 43, 44, 45, 47] Traceback (most recent call last): File "GAR.py", line 23, in Pre.Processing(opt.dataset_root, opt.dataset_name, 'track') File "C:\Users\GROUP\Desktop\PCTDMGAR\Pre\Processing.py", line 27, in init eval(self.datasetname + '' + str.capitalize(operation))(self.dataset_root, dataset_confs, model_confs) File "C:\Users\GROUP\Desktop\PCTDMGAR\Pre\VD_Track.py", line 14, in init self.getTrainTest() File "C:\Users\GROUP\Desktop\PCTDMGAR\Pre\VD_Track.py", line 62, in getTrainTest for i in xrange(self.num_players*self.num_frames): NameError: name 'xrange' is not defined

It seem the error is from

*VD_Track.py", line 62, in getTrainTest for i in xrange(self.num_playersself.num_frames): NameError: name 'xrange' is not defined**

ruiyan1995 commented 5 years ago

It's best to run code in python 2.7

NaifahNurya commented 5 years ago

Suggestion it solve the problem,

However, the following problem occur, seems is the problem with the path, but when looking on the folder, the txt file was created. How can this be solved?

Tracking VD in 435m 29s Please wait for ranking! about 180min for VD Ranking VD in 0m 0s Please wait for training action! Needs 200min for 20epochs(VD).

data_confs Namespace(batch_size={'trainval': 300, 'test': 10}, data_type='img', dataset_folder='./dataset/VD\imgs_ranked', label_type='action')

Capture1

Traceback (most recent call last): File "GAR.py", line 37, in Action = Runtime.Action_Level(opt.dataset_root, opt.dataset_name, 'trainval_action') File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Action_Level.py", line 9, in init super(Action_Level, self).init(dataset_root, dataset_name, 'action', mode) File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Piplines.py", line 23, in init self.configuring() File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Piplines.py", line 35, in configuring self.data_loaders, self.data_sizes = self.loadData(self.data_confs) File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Piplines.py", line 84, in loadData data_confs.dataset_folder, phase, data_confs.label_type, data_transforms[phase] if data_transforms else None) for phase in phases} File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Piplines.py", line 84, in data_confs.dataset_folder, phase, data_confs.label_type, data_transforms[phase] if data_transforms else None) for phase in phases} File "C:\Users\GROUP\Desktop\PCTDMGAR\Data\img.py", line 14, in init lines = open(self.txt_file) FileNotFoundError: [Errno 2] No such file or directory: './dataset/VD\imgs_ranked\trainval_action.txt'

Capture2

I dont know why it create two slashes ( \) instead of one slash \ after VD in the path as shown in attached file which elaborate bolded text above.

When looking the folder; four files created and stored in ./dataset/VD\imgs_ranked\

These file created with the following name;

test_action.txt test_activity.txt trainval_action.txt trainval_activity

ruiyan1995 commented 5 years ago

Welcome! I guess that it may be casued by the different operation systems. Our code is debugged on Linux, yet did you run it on Windows? The easy way to deal with file paths on Windows, Mac and Linux is shown at https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f But, I highly recommend you choose Ubuntu!

NaifahNurya commented 5 years ago

Yes currently i run it on window;

Thank you for the link you recommended to me, i will follow and tr to fix it.

NaifahNurya commented 5 years ago

Helow, I got Out of Memory error, Does reducing batch_size in Action_level.py from 10 to 5 will solve the problem? Seems is the prob with my GPU capacity. Or what should I do?

The following is the error RuntimeError: CUDA out of memory. Tried to allocate 345.50 MiB (GPU 0; 8.00 GiB total capacity; 2.77 GiB already allocated; 3.08 GiB free; 429.08 MiB cached)

With the following traceback:

Traceback (most recent call last): File "GAR.py", line 42, in Action.trainval() File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Piplines.py", line 58, in trainval self.solver.train_model() File "C:\Users\GROUP\Desktop\PCTDMGAR\Solver.py", line 83, in train_model self.training(inputs, labels, phase) File "C:\Users\GROUP\PCTDMGAR\Solver.py", line 50, in training loss.backward() File "C:\Users\GROUP\Anaconda3\envs\GAR\lib\site-packages\torch\tensor.py", line 102, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\GROUP\Anaconda3\envs\GAR\lib\site-packages\torch\autograd__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: CUDA out of memory. Tried to allocate 345.50 MiB (GPU 0; 8.00 GiB total capacity; 2.77 GiB already allocated; 3.08 GiB free; 429.08 MiB cached)

ruiyan1995 commented 5 years ago

For action_level, the test batch_size must be 10, but the trainval batch_size can be reduced. And the trainval batch_size should keep N*10, such as 100, 200.

NaifahNurya commented 5 years ago

thank you, So I have to change on Data_Configs.py at line 31 as shown below, from 300 eg to 200

_'batch_size': {'trainval_action': {'trainval': 300, 'test': 10}, 'extract_action_feas': {'trainval': 120, 'test': 120} }_

And this extracted feature should remain as it is,

'extract_action_feas': {'trainval': 120, 'test': 120}

NaifahNurya commented 5 years ago

The memory issue is cleared,

But i got the following error after finishing Epoch 9/9 and saving the best accuracy; The error start when start extracting feat; It seems it refer to the weight folder which has action.pkl but in that folder after 9/9 epoch there is best_wts.pkl . Do i need to change line 20 in Action_Level.py so that to refer to best_wts.pkl? _self.net.load_statedict(torch.load('./weights/VD/action/action.pkl'))

Description are shown below

Epoch 9/9

Epoch: 9 phase: trainval Loss: 0.0019305260316276466 Acc: 0.8956937329967403 Running this epoch in 13m 35s Epoch: 9 phase: test Loss: 0.0629564055343197 Acc: 0.8163081283009146 Running this epoch in 6m 48s Best test Acc: 0.818176 Training action VD in 209m 46s Please wait for extracting action_feas! data_confs Namespace(batch_size={'trainval': 120, 'test': 120}, data_type='img', dataset_folder='./dataset/VD\imgs_ranked', label_type='activity') AlexNet_LSTM( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)) (1): ReLU(inplace) (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (4): ReLU(inplace) (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): ReLU(inplace) (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): ReLU(inplace) (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace) (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) ) (fc): Sequential( (0): Dropout(p=0.5) (1): Linear(in_features=9216, out_features=4096, bias=True) (2): ReLU(inplace) ) (LSTM): LSTM(4096, 3000, batch_first=True) (classifier): Linear(in_features=3000, out_features=9, bias=True) ) Traceback (most recent call last): File "GAR.py", line 50, in Action.extractFeas() File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Action_Level.py", line 20, in extractFeas self.net.load_state_dict(torch.load('./weights/VD/action/action.pkl')) File "C:\Users\GROUP\Anaconda3\envs\GAR\lib\site-packages\torch\serialization.py", line 366, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: './weights/VD/action/action.pkl'

In the folder weights/VD/action/ There is only two files as shown in picture below;

Capture

ruiyan1995 commented 5 years ago

The memory issue is cleared,

But i got the following error after finishing Epoch 9/9 and saving the best accuracy; The error start when start extracting feat; It seems it refer to the weight folder which has action.pkl but in that folder after 9/9 epoch there is best_wts.pkl . Do i need to change line 20 in Action_Level.py so that to refer to best_wts.pkl? _self.net.load_statedict(torch.load('./weights/VD/action/action.pkl'))

Description are shown below

Epoch 9/9

Epoch: 9 phase: trainval Loss: 0.0019305260316276466 Acc: 0.8956937329967403 Running this epoch in 13m 35s Epoch: 9 phase: test Loss: 0.0629564055343197 Acc: 0.8163081283009146 Running this epoch in 6m 48s Best test Acc: 0.818176 Training action VD in 209m 46s Please wait for extracting action_feas! data_confs Namespace(batch_size={'trainval': 120, 'test': 120}, data_type='img', dataset_folder='./dataset/VD\imgs_ranked', label_type='activity') AlexNet_LSTM( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)) (1): ReLU(inplace) (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (4): ReLU(inplace) (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): ReLU(inplace) (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): ReLU(inplace) (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace) (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) ) (fc): Sequential( (0): Dropout(p=0.5) (1): Linear(in_features=9216, out_features=4096, bias=True) (2): ReLU(inplace) ) (LSTM): LSTM(4096, 3000, batch_first=True) (classifier): Linear(in_features=3000, out_features=9, bias=True) ) Traceback (most recent call last): File "GAR.py", line 50, in Action.extractFeas() File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Action_Level.py", line 20, in extractFeas self.net.load_state_dict(torch.load('./weights/VD/action/action.pkl')) File "C:\Users\GROUP\Anaconda3\envs\GAR\lib\site-packages\torch\serialization.py", line 366, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: './weights/VD/action/action.pkl'

In the folder weights/VD/action/ There is only two files as shown in picture below;

Yes, you're right. And I forgot to fix it.

NaifahNurya commented 5 years ago

What is the recommended size of the RAM? Because i got MemoryError at Action.extractFeas() stage even if after reducing extract_action_feas in Data_config.py file at line 31 from 120 to 40 in trainval and test.

'batch_size': {'trainval_action': {'trainval': 100, 'test': 10}, 'extract_action_feas': {'trainval': 40, 'test': 40} }

I run my experiment on a PC with RAM of 8GB.

The error is is as shown below;

Epoch: 8 phase: trainval Loss: 0.0025453602908048737 Acc: 0.9071297272125683 Running this epoch in 13m 52s Epoch: 8 phase: test Loss: 0.06562032307655678 Acc: 0.8170166172871313 Running this epoch in 6m 53s Epoch 9/9

Epoch: 9 phase: trainval Loss: 0.0023482153126937293 Acc: 0.9154260924977329 Running this epoch in 13m 53s Epoch: 9 phase: test Loss: 0.0678530701597602 Acc: 0.8146335179698571 Running this epoch in 6m 53s Best test Acc: 0.819335 Training action VD in 212m 9s Please wait for extracting action_feas! data_confs Namespace(batch_size={'trainval': 40, 'test': 40}, data_type='img', dataset_folder='./dataset/VD/imgs_ranked', label_type='activity') AlexNet_LSTM( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)) (1): ReLU(inplace) (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (4): ReLU(inplace) (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): ReLU(inplace) (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): ReLU(inplace) (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace) (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) ) (fc): Sequential( (0): Dropout(p=0.5) (1): Linear(in_features=9216, out_features=4096, bias=True) (2): ReLU(inplace) ) (LSTM): LSTM(4096, 3000, batch_first=True) (classifier): Linear(in_features=3000, out_features=9, bias=True) ) trainval 34930.0

Traceback (most recent call last): File "GAR.py", line 50, in Action.extractFeas() File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\ActionLevel.py", line 34, in extractFeas *feas = np.zeros([dataset_size/K, feas_sizeK+1])**_ MemoryError

Also if you have any suggestion on this reported error .

ruiyan1995 commented 5 years ago

I'm sorry, It may be caused by numpy array, the ‘trainval.npy’ is larger than 22G. Thus, the creation of this matrix may take such large memory. I'm sorry I didn't consider the limitation of RAM. And, the '.npy' does not support the append mode, there are the following solutions:

run 'PCTDM' on a workstation with at least 32G RAM;
use 'hdf5' to replace 'npy', refer to https://gist.github.com/wassname/a0a75f133831eed1113d052c67cf8633

To facilitate study follow-up, it is recommended to increase the memory capacity.

PS: You can comment on the codes before extract_fea for reducing time, there is no need to train action_level again!

NaifahNurya commented 5 years ago

Thank you very much for your quick reply, and recommendations.

NaifahNurya commented 5 years ago

Another issue found,

It seems the file trainval.npy are not created as expected by line 38 (np.save(filename, feas)) in Action_Level .py The error is below:

FileNotFoundError: [Errno 2] No such file or directory: 'dataset\VD\feas\activity\trainval.npy'

Description of the stage and traceback can be shown below;

Please wait for extracting action_feas! data_confs Namespace(batch_size={'trainval': 40, 'test': 40}, data_type='img', dataset_folder='.\dataset\VD\imgs_ranked', label_type='activity') AlexNet_LSTM( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)) (1): ReLU(inplace) (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (4): ReLU(inplace) (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): ReLU(inplace) (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): ReLU(inplace) (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace) (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) ) (fc): Sequential( (0): Dropout(p=0.5) (1): Linear(in_features=9216, out_features=4096, bias=True) (2): ReLU(inplace) ) (LSTM): LSTM(4096, 3000, batch_first=True) (classifier): Linear(in_features=3000, out_features=9, bias=True) ) trainval 34930.0 Traceback (most recent call last): File "GAR.py", line 50, in Action.extractFeas() File "C:\Users\GROUP\Desktop\PCTDMGAR\Runtime\Action_Level.py", line 36, in extractFeas np.save(filename, feas) File "C:\Users\GROUP\Anaconda3\envs\GAR\lib\site-packages\numpy\lib\npyio.py", line 517, in save fid = open(file, "wb") FileNotFoundError: [Errno 2] No such file or directory: 'dataset\VD\feas\activity\trainval.npy'

ruiyan1995 commented 5 years ago

1) make sure that you execute 'python GAR.py' at C:\Users\GROUP\Desktop\PCTDMGAR\, 2) what does the line 36 in 'Action_Level.py' print out? Please check carefully for the creation of this file.

NaifahNurya commented 5 years ago

Thank you, I did a mistake, now is extracting feature.

ruiyan1995 / Group-Activity-Recognition