sovrasov / flops-counter.pytorch

Flops counter for convolutional networks in pytorch framework
MIT License
2.78k stars 309 forks source link

FP16 (half percision) support? #11

Open zhengsx opened 5 years ago

zhengsx commented 5 years ago

I really appreciate this flops counter. Would you guys plan to support the calculation with FP16?

sovrasov commented 5 years ago

What exactly do you mean under FP16 support? Now you can convert your FP16 model to FP32 and run ptflops to estimate amount of macs.

net = net.float()
flops, params = get_model_complexity_info(net, (3, 224, 224),
                                                  as_strings=True,
                                                  print_per_layer_stat=True)

get_model_complexity_info will infer FP32 blob. If it would infer an FP16 one we can save some memory and get a little speedup. Is it critical for you?

zhengsx commented 5 years ago

So for a certain model, the flops_fp16, params_fp16 == flops_fp32/2, params_fp32/2, hence we don't need to calculate flops_fp16? Is that correct?

sovrasov commented 5 years ago

Not exactly.

  1. The number of parameters remains the same for all floating point types. Amount of memory required to store the parameters differs: mem(fp16)=mem(fp32)/2=mem(fp64)/4
  2. When we speaking about amount of flops or macs we need to specify what floating point type do we mean. ptflops counts multiply-add operations ~= flops*2 for an abstract floating-point type. Abstract means that the obtained number is type-agnostic. For any FP type model consumes the same number of these abstract operations, the difference is in real running time: time(fp16)=time(fp32)/2=time(fp64)/4.