The inaccurate flop results after several rounds

Hi I tried to use the method "get_model_profile" to get the latency and flop for my model. To get avoid of the influence from randomness, I used this method in a for loop for several times, and then an average operation would be done. However, I found the results for the following rounds of the first one are not correct, which is far away from the theoritical result. As shown in the fig below, you could see the flops is increasing with the round, which is not correct, since I gave the same size of input into the model.

And this is the code:

def test_model(model, input_shape, warmup=20, num_tests=1000):
    results = []

    for _ in range(num_tests):
        #from profiler import get_model_profile
        flops, macs, params, latency = profiler.get_model_profile(
            model=model,
            input_shape=input_shape,
            print_profile=False,
            detailed=True,
            module_depth=-1,
            top_modules=1,
            warm_up=warmup,
            as_string=False
        )
        del sys.modules['profiler']
        results.append((flops/10**9, macs/10**9, params/10**3, latency*10**3))

    df = pd.DataFrame(results, columns=['FLOPs', 'MACs', 'Params', 'Latency'])
    return df

df_swin = test_model(Swin, (batch_size, math.prod(input_resolution), dim), warmup=warmup, num_tests=num_tests)

I tried to modify this code, and found if I could assign the model again in a different iteration with the profiler imported again, then the result is correct, shown in the fig below.

And the following is the modified code.

def test_model(input_shape, warmup=20, num_tests=1000):
    results = []
    for _ in range(num_tests):
        #from profiler import get_model_profile
        import profiler
        model = MySwinTransformerModel(dim, input_resolution, num_heads, window_size, mlp_ratio, depth).to(device) 
        # model = MyTensorizedTransformerModel(dim, input_resolution, num_heads, n_proj, mlp_ratio, depth).to(device) 
        flops, macs, params, latency = profiler.get_model_profile(
            model=model,
            input_shape=input_shape,
            print_profile=False,
            detailed=True,
            module_depth=-1,
            top_modules=1,
            warm_up=warmup,
            as_string=False
        )
        del sys.modules['profiler']
        results.append((flops/10**9, macs/10**9, params/10**3, latency*10**3))

    df = pd.DataFrame(results, columns=['FLOPs', 'MACs', 'Params', 'Latency'])
    return df

microsoft / DeepSpeedExamples

The inaccurate flop results after several rounds #855