microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
6.09k stars 1.04k forks source link

The inaccurate flop results after several rounds #855

Open BitCalSaul opened 10 months ago

BitCalSaul commented 10 months ago

Hi I tried to use the method "get_model_profile" to get the latency and flop for my model. To get avoid of the influence from randomness, I used this method in a for loop for several times, and then an average operation would be done. However, I found the results for the following rounds of the first one are not correct, which is far away from the theoritical result. As shown in the fig below, you could see the flops is increasing with the round, which is not correct, since I gave the same size of input into the model.

image

And this is the code:

def test_model(model, input_shape, warmup=20, num_tests=1000):
    results = []

    for _ in range(num_tests):
        #from profiler import get_model_profile
        flops, macs, params, latency = profiler.get_model_profile(
            model=model,
            input_shape=input_shape,
            print_profile=False,
            detailed=True,
            module_depth=-1,
            top_modules=1,
            warm_up=warmup,
            as_string=False
        )
        del sys.modules['profiler']
        results.append((flops/10**9, macs/10**9, params/10**3, latency*10**3))

    df = pd.DataFrame(results, columns=['FLOPs', 'MACs', 'Params', 'Latency'])
    return df

df_swin = test_model(Swin, (batch_size, math.prod(input_resolution), dim), warmup=warmup, num_tests=num_tests)

I tried to modify this code, and found if I could assign the model again in a different iteration with the profiler imported again, then the result is correct, shown in the fig below.

image

And the following is the modified code.

def test_model(input_shape, warmup=20, num_tests=1000):
    results = []
    for _ in range(num_tests):
        #from profiler import get_model_profile
        import profiler
        model = MySwinTransformerModel(dim, input_resolution, num_heads, window_size, mlp_ratio, depth).to(device) 
        # model = MyTensorizedTransformerModel(dim, input_resolution, num_heads, n_proj, mlp_ratio, depth).to(device) 
        flops, macs, params, latency = profiler.get_model_profile(
            model=model,
            input_shape=input_shape,
            print_profile=False,
            detailed=True,
            module_depth=-1,
            top_modules=1,
            warm_up=warmup,
            as_string=False
        )
        del sys.modules['profiler']
        results.append((flops/10**9, macs/10**9, params/10**3, latency*10**3))

    df = pd.DataFrame(results, columns=['FLOPs', 'MACs', 'Params', 'Latency'])
    return df
BitCalSaul commented 10 months ago

I'm using this commit from https://github.com/KimmiShi/DeepSpeed/tree/flops_profiler_attn since I want to get flops for @ operation in transformer-based models, which the released version doesn't have.