megvii-model / SinglePathOneShot

MIT License
259 stars 53 forks source link

What is the purpose of setting flops bins when training SuperNet? #11

Open jianghaojun opened 3 years ago

jianghaojun commented 3 years ago

https://github.com/megvii-model/SinglePathOneShot/blob/36eed6cf083497ffa9cfe7b8da25bb0b6ba5a452/src/Supernet/train.py#L213

It seems that author want the candidates' flops distribute uniformly among [290, 360].

However, I random sampled 12500 candidates and calculate their flops. The nature distribution of the ShuffleNet SuperNet is like [ 132. 828. 2366. 3941. 3482. 1406. 312.], each number indicates the random sampled candidates whose flops are among [[290, 300], [310, 320], ..., [350, 360]].

If you don't put any constrain on candidates' flops, the occurrence probability of each choice block in each layer is very close to a uniform distribution(because of numpy.random.randint) which is consistent with the paper.

However, once the flops constrain is added, the occurrence probability of choice block will not strictly follow uniform distribution.

So I am curious about the motivation?Or it is just a experimental experience?

chenbohua3 commented 3 years ago

Same question.

Most results in the paper come from the models which have about 320M FLOPs, this makes me think that the bin setting is used to make the models with smaller FLOPs get trained more.

haiduo commented 6 months ago

感觉是为了让所有的supernet更倾向于[290, 360]之间进行学习,这样搜索的时候,320M左右的子网也就普遍精度较高,一次性囊括所有范围的子网感觉学起来难度很大,所以可以认为作者这里算是一个实验trick吗?