Open skx6 opened 5 years ago
I use different architectures to train classify model. sometime it shows "out of memory" .
Hello @SongKaixiang @Margrate !
Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model.
For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size.
GL,
I improved the code to make it compatible with PyTorch 1.1 while allowing multi-GPU training on both RNN and CNN experiments.~ you can refer: https://github.com/alphadl/darts.pytorch1.1
Hello @SongKaixiang @Margrate !
Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model.
For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size.
GL,
Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:
Hello @SongKaixiang @Margrate ! Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model. For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size. GL,
Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:
- image size: 224*224
nodes in one cell: 4
- layers: 6 Only when I set the batch size equal to 2, then the code can run otherwise it will throw Out of Memory error. I think the model is not large, even small.
Similar condition.
image size is set to (224,224) in train_search.py
, but it still returns 'out of memory' message immediately even when I set the layers to 4 and batch size to 1.
Runing Envs: Python 2.7, pytorch 0.3.1.post2, CUDA 9.0.
PS. I am using a single 2080Ti with memory of few above 11GB
Hello @SongKaixiang @Margrate ! Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model. For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size. GL,
Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:
- image size: 224*224
nodes in one cell: 4
- layers: 6 Only when I set the batch size equal to 2, then the code can run otherwise it will throw Out of Memory error. I think the model is not large, even small.
Similar condition.
image size is set to (224,224) in
train_search.py
, but it still returns 'out of memory' message immediately even when I set the layers to 4 and batch size to 1.Runing Envs: Python 2.7, pytorch 0.3.1.post2, CUDA 9.0.
PS. I am using a single 2080Ti with memory of few above 11GB
Hello @rrryan2016,
Thank you for your question. As for my case, I was using one V100 with 32G RAM. Unfortunately, DARTS is very space-consuming when searching for architectures. So you may like to try with the smaller batch size and block structure. Or even reduce some of the layer options, because, in the searching phase, the model is K times larger than the final model (for K stands for the number of operations for each layer).
Good luck and have fun:)
PF
Or even reduce some of the layer options, because, in the searching phase, the model is K times larger than the final model (for K stands for the number of operations for each layer).
Thx for your kind reply. Really helpful.
I got different architecture through running train_search.py for many times. Some times it showed "out of memery".