hongzimao / pensieve

Neural Adaptive Video Streaming with Pensieve (SIGCOMM '17)
http://web.mit.edu/pensieve/
MIT License
517 stars 280 forks source link

Can the program be trained using the GPU #81

Closed ma3252788 closed 4 years ago

ma3252788 commented 5 years ago

Can the program be trained using the GPU? I used the GPU. As a result, each agent needs to allocate memory. The result shows that the GPU memory is insufficient. Can anyone help me? Thank you

2019-07-30 03:35:51.118858: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *********************************************xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2019-07-30 03:35:51.119107: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 6.38G (6854984448 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-07-30 03:35:51.119149: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.0KiB.  Current allocation summary follows.
2019-07-30 03:35:51.119173: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119191: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119208: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1024):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119227: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2048):  Total Chunks: 1, Chunks in use: 1. 2.2KiB allocated for chunks. 2.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2019-07-30 03:35:51.119246: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4096):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119262: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119279: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16384):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119296: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (32768):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119314: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (65536):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119330: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (131072):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119347: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (262144):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119364: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (524288):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119381: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1048576):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119398: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2097152):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119415: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4194304):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119432: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8388608):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119449: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16777216):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119466: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (33554432):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119485: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (67108864):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119502: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119519: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-07-30 03:35:51.119537: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 16.0KiB was 16.0KiB, Chunk State: 
2019-07-30 03:35:51.119553: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fcc33000000 of size 2304
2019-07-30 03:35:51.119568: I tensorflow/core/common_runtime/bfc_allocator.cc:638]      Summary of in-use Chunks by size: 
2019-07-30 03:35:51.119584: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 2304 totalling 2.2KiB
2019-07-30 03:35:51.119600: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 2.2KiB
2019-07-30 03:35:51.119618: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
Limit:                  6854986957
InUse:                        2304
MaxInUse:                     2304
NumAllocs:                       1
MaxAllocSize:                 2304
hongzimao commented 5 years ago

You can run it on just CPU to get started. Add

import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''

on top of the python script to bypass Nvidia GPUs.

ma3252788 commented 5 years ago

You can run it on just CPU to get started. Add

import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''

on top of the python script to bypass Nvidia GPUs.

Does the CPU and GPU training time have a big difference? How long does it take to train if you use a CPU?

hongzimao commented 5 years ago

https://github.com/hongzimao/pensieve/issues/82 The speed depends on your hardware. Since the model is not too large, CPU and GPU performed similarly in our experiment.