Open neerajbattan opened 6 years ago
Hi, did you use the latest version? It only needs 6GB memory.
(base) F:\Suresh\st-gcn>python main1.py recognition -c config/st_gcn/ntu-xsub/train.yaml --device 0 --work_dir ./work_dir C:\Users\cudalab10\Anaconda3\lib\site-packages\torch\cuda__init__.py:117: UserWarning: Found GPU0 TITAN Xp which is of cuda capability 1.1. PyTorch no longer supports this GPU because it is too old.
warnings.warn(old_gpu_warn % (d, name, major, capability[1])) [05.22.19|12:02:41] Parameters: {'base_lr': 0.1, 'ignore_weights': [], 'model': 'net.st_gcn.Model', 'eval_interval': 5, 'weight_decay': 0.0001, 'work_dir': './work_dir', 'save_interval': 10, 'model_args': {'in_channels': 3, 'dropout': 0.5, 'num_class': 60, 'edge_importance_weighting': True, 'graph_args': {'strategy': 'spatial', 'layout': 'ntu-rgb+d'}}, 'debug': False, 'pavi_log': False, 'save_result': False, 'config': 'config/st_gcn/ntu-xsub/train.yaml', 'optimizer': 'SGD', 'weights': None, 'num_epoch': 80, 'batch_size': 64, 'show_topk': [1, 5], 'test_batch_size': 64, 'step': [10, 50], 'use_gpu': True, 'phase': 'train', 'print_log': True, 'log_interval': 100, 'feeder': 'feeder.feeder.Feeder', 'start_epoch': 0, 'nesterov': True, 'device': [0], 'save_log': True, 'test_feeder_args': {'data_path': './data/NTU-RGB-D/xsub/val_data.npy', 'label_path': './data/NTU-RGB-D/xsub/val_label.pkl'}, 'train_feeder_args': {'data_path': './data/NTU-RGB-D/xsub/train_data.npy', 'debug': False, 'label_path': './data/NTU-RGB-D/xsub/train_label.pkl'}, 'num_worker': 4}
[05.22.19|12:02:41] Training epoch: 0
Traceback (most recent call last):
File "main1.py", line 31, in
here is the task manager screen-shot.
I'm getting memory limit exceeded even after allocating 125GB for the training of ntu_xview.
Getting this error
[11.14.18|20:57:54] Parameters: {'print_log': True, 'log_interval': 100, 'step': [10, 50], 'save_result': False, 'test_feeder_args': {'data_path': '/scratch/neeraj.b/data/NTU-RGB-D/xview/val_data.npy', 'label_path': '/scratch/neeraj.b/data/NTU-RGB-D/xview/val_label.pkl'}, 'optimizer': 'SGD', 'start_epoch': 0, 'batch_size': 64, 'phase': 'train', 'base_lr': 0.1, 'num_worker': 4, 'debug': False, 'weights': None, 'save_interval': 10, 'pavi_log': False, 'ignore_weights': [], 'save_log': True, 'num_epoch': 80, 'use_gpu': True, 'weight_decay': 0.0001, 'test_batch_size': 64, 'device': [0, 1, 2, 3], 'nesterov': True, 'feeder': 'feeder.feeder.Feeder', 'train_feeder_args': {'data_path': '/scratch/neeraj.b/data/NTU-RGB-D/xview/train_data.npy', 'label_path': '/scratch/neeraj.b/data/NTU-RGB-D/xview/train_label.pkl', 'debug': False}, 'eval_interval': 5, 'config': 'config/st_gcn/ntu-xview/train.yaml', 'work_dir': '/scratch/neeraj.b/data/work_dir', 'model_args': {'num_class': 60, 'graph_args': {'strategy': 'spatial', 'layout': 'ntu-rgb+d'}, 'edge_importance_weighting': True, 'dropout': 0.5, 'in_channels': 3}, 'show_topk': [1, 5], 'model': 'net.st_gcn.Model'}
[11.14.18|20:57:54] Training epoch: 0 slurmstepd: Job 157430 exceeded memory limit (131272228 > 131072000), being killed slurmstepd: Exceeded job memory limit