susan0199 / StacNAS

18 stars 12 forks source link

i can only run it on one gpu.......is there any multi-gpu version? #2

Open Magilss opened 4 years ago

susan0199 commented 4 years ago

To use multi-GPU, you can simply add model = nn.DataParallel(model) after building the model and change model. to model.module. We are still cleaning the ImageNet code, if you need, please leave your email address, I can send to you.

Magilss commented 4 years ago

To use multi-GPU, you can simply add model = nn.DataParallel(model) after building the model and change model. to model.module. We are still cleaning the ImageNet code, if you need, please leave your email address, I can send to you.

thank you,the mutli problem is ok. the method is very nice and much easier for use than darts.

the new problem is the pre-model seems have problem....follow your readme and my ide gives me error in torch.load....... LOL

ModuleNotFoundError:No module named‘models.augment_cells’

I scan the error and it seems the pre-models aren‘t match the code on github.....

my email:3150104097@zju.edu.cn

susan0199 commented 4 years ago

I've update the cifar10 pretrained model. You should be able to load it now.

Magilss commented 4 years ago

I've update the cifar10 pretrained model. You should be able to load it now.

thank you for the latest model! your method can also give a good performance with smaller size(cell_num and init_channel are few)!it seems stacnas prefers sep conv LOL

also wait for cifar100 one and imagenet one!

additionally,suggest you use ways like

state = {'net': model.module.state_dict()} torch.save(state, './your/path')

and give a model_final.py as DARTS——just a suggestion

Maybe loading the whole model have a lot small problem for different environment even same environment but different server

for example

SourceChangeWarning: source code of class 'augment_cnn.AugmentCNN' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.

AttributeError: 'Conv2d' object has no attribute 'padding_mode'

my environment is pytorch1.1 python 3.7 torchvision 0.2

requirement:Python >= 3.5.5, PyTorch >= 1.0.0, torchvision >= 0.2.0

for multi problem, the key-point I asked before is when I use "DataParallel" and "model.module" the MixOp will case a “data on different devices” problem, and should add some ".cuda()" maybe it is also caused by environment or my server LOL

kl2005ad commented 4 years ago

Can you also send me the code for training on ImageNet with multiple GPUs? Email is kepler113@outlook.com. Thank you.