quark0 / darts

Differentiable architecture search for convolutional and recurrent networks
https://arxiv.org/abs/1806.09055
Apache License 2.0
3.92k stars 843 forks source link

out of memory for different architecture(pytorch0.3) #98

Open skx6 opened 5 years ago

skx6 commented 5 years ago

I got different architecture through running train_search.py for many times. Some times it showed "out of memery".

Margrate commented 5 years ago

I use different architectures to train classify model. sometime it shows "out of memory" .

Catosine commented 5 years ago

Hello @SongKaixiang @Margrate !

Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model.

For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size.

GL,

alphadl commented 5 years ago

I improved the code to make it compatible with PyTorch 1.1 while allowing multi-GPU training on both RNN and CNN experiments.~ you can refer: https://github.com/alphadl/darts.pytorch1.1

marsggbo commented 5 years ago

Hello @SongKaixiang @Margrate !

Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model.

For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size.

GL,

Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:

rrryan2016 commented 3 years ago

Hello @SongKaixiang @Margrate ! Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model. For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size. GL,

Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:

  • image size: 224*224
  • nodes in one cell: 4

  • layers: 6 Only when I set the batch size equal to 2, then the code can run otherwise it will throw Out of Memory error. I think the model is not large, even small.

Similar condition.

image size is set to (224,224) in train_search.py, but it still returns 'out of memory' message immediately even when I set the layers to 4 and batch size to 1.

Runing Envs: Python 2.7, pytorch 0.3.1.post2, CUDA 9.0.

PS. I am using a single 2080Ti with memory of few above 11GB

Catosine commented 3 years ago

Hello @SongKaixiang @Margrate ! Congratulations! You actually found a weakness of DARTS. Many following works, for example, ProxylessNAS, pointed out that althought DARTS is much faster than its predecessors, its search space is very memory consuming, for a DARTS model is equivalent to k times (k is the number of operations in MixedOp) larger than a normal DNN model. For your questions, I have two suggestions: i) if the model itself takes all memory, try with smaller DARTS model;(i.e. search for 5 cells rather than 8 cells, or use 3 nodes ranther than 4); ii) if there is still some memory left after loading DARTS to GPU(s), try with smaller batch size and crop size. GL,

Hello, thanks for your suggestions @Catosine. Could you explain more about why it takes up a lot of memory? For example, in my case, some configs are as follows:

  • image size: 224*224
  • nodes in one cell: 4

  • layers: 6 Only when I set the batch size equal to 2, then the code can run otherwise it will throw Out of Memory error. I think the model is not large, even small.

Similar condition.

image size is set to (224,224) in train_search.py, but it still returns 'out of memory' message immediately even when I set the layers to 4 and batch size to 1.

Runing Envs: Python 2.7, pytorch 0.3.1.post2, CUDA 9.0.

PS. I am using a single 2080Ti with memory of few above 11GB

Hello @rrryan2016,

Thank you for your question. As for my case, I was using one V100 with 32G RAM. Unfortunately, DARTS is very space-consuming when searching for architectures. So you may like to try with the smaller batch size and block structure. Or even reduce some of the layer options, because, in the searching phase, the model is K times larger than the final model (for K stands for the number of operations for each layer).

Good luck and have fun:)

PF

rrryan2016 commented 3 years ago

Or even reduce some of the layer options, because, in the searching phase, the model is K times larger than the final model (for K stands for the number of operations for each layer).

Thx for your kind reply. Really helpful.