Experiment cannot be reproduced！

pengyao96 commented 2 years ago

Hello, I am very interested in your work. According to your code, I tried to reproduce it. According to the default configuration file, I found ## iterations in search_config ## is 0, it may be wrong, I set it 50,.Then an experiment on the V100 totaled about 4.5h, but I still found that the effect is very bad. The Acc Top1 is only around 10%; Is there any place I haven't noticed? Could you give me your previous experiment logs? Thanks!

pengyao96 commented 2 years ago

Maybe settings.py has some wrong; ## training_config.train_supernet_epochs==1 ## is not correct.

pipilurj commented 2 years ago

Hi Pengyao, Thank you very much for your interest in our work. The settings.py file may not be correct, since it was probably for ablative studies purpose when we uploaded it. Please modify the following: In search_config, -change generate_num from 100 to 10000, this is the domain from which BO samples candidates. -change iterations from 0 to 1000, this is how many rounds of searching you want to conduct, the total number of evaluated networks is therefore #iterations x bo_sample_num. In training_config,

change train_supernet_epochs from 1 to 5 (may not be optimal), this is how many epochs you train the whole supernet (merged by the sampled subnets) at the begining of each search iteration.
change epochs from 1 to 100, this is how many epochs to train the subnets during each iteration. I will modify the settings.py later, and provide better documentations to explain the meaning of each hyper-parameters.

The experiment was run on company cloud server, I can no longer access it as I already left the company. I can only manage to get the log file and csv file from another experiment that was run on an offline server. I attach them here for your reference. log.txt csv.csv

In our experiments, the promising subnets should reach to around 92-93 top1 acc during search, and their final performances are reported in the paper. Thanks a lot for pointing the problems out!

pipilurj commented 2 years ago

I am closing this issue for now, if you have further questions, feel free to ask.

pengyao96 commented 2 years ago

Thanks for your reply, you have solved most of my problems; But I still have some doubts about this code, I hope we can discuss it further!

What you explained above (train_supernet_epochs=5 or 10? ) is slightly inconsistent with the updated ## setting.py ##. But I will experiment with the updated setting.py now.
According to the results in the paper，BONAS-A，B，C，D covers 1200，2400，3600，4800 samples evaluated; If we set bo_sample_num==100, then we just need run 12,24,36,48 iterations, isn't it? But I found the iterations in ##setting.py## is 500, may be wrong?
Some acceleration problems; If we have 8 V100, 32G GPU, whether we can accelerate Supernet training by increase batch size or other methods? I think this stage cost more time than others all;
I found the best network acc is 0.913 in the first iteration sampled subnet. When all iterations are finish, I need a full train process to get the acc as your paper report, such around acc 97.5?

pipilurj commented 2 years ago

Hi Pengyao,

train_supernet_epochs can be set to both 5 and 10, I have not compared their performances, but they should be similar.
You do not need to wait until the program terminates, you can observe the generated csv, at a certain point the performance of evaluated samples start to converge (enter a flat zone). You can pick the ones with highest score to fully train.
Yes, you can increase the batch size, or reduce the #epochs for each iteration, or increase the #bo samples at each iteration. If you have more than one GPUs, you can conduct search on every one of them, while sharing the same pool of evaluated samples to train the GCN, this will also speed up the search, however in that case, I suggest that you reduce the #bo_sample_num to increase the sample frequency while maintaining the search efficiency.
Indeed, you need to pick the best ones according to their evaluation scores to fully train them. since there are more than one networks evaluated at each iteration.
The problem with cifar10 dataset is that there are a lot of networks with similar performances, you can fully train more than one networks to observe their performances.

pengyao96 commented 2 years ago

If I want to validate BONAS on NASbench101/201 or Imagenet datasets; Does this code support above datasets? Need I add some new codes to support it? Thanks!

pipilurj commented 2 years ago

Yes, with some minor modifications, you can fit BONAS to those datasets. The API of those datasets back then were incompatible with their current version (as NASBENCH201 was still under review of ICLR), so we did not upload the code. You only need to encode the architectures into graphs, the core part of the algorithm does not need to be modified. Also, now there are some datasets targeting at benchmarking weight-sharing NAS methods, which I also recommend you give a try.

pipilurj / BONAS

Experiment cannot be reproduced！ #1