Closed Renpf2022 closed 6 months ago
It seems that the data that has already obtained the activation is still occupying memory and has not been properly released
It seems that there's a memory leak. Below are lines 65-69 in the original code:
for prompt in tqdm(prompts):
layer_wise_activations, head_wise_activations, _ = get_llama_activations_bau(model, prompt, device)
all_layer_wise_activations.append(layer_wise_activations[:,-1,:])
all_head_wise_activations.append(head_wise_activations[:,-1,:])
Which I changed into:
for prompt in tqdm(prompts):
layer_wise_activations, head_wise_activations, _ = get_llama_activations_bau(model, prompt, device)
all_layer_wise_activations.append(layer_wise_activations[:,-1,:].copy())
all_head_wise_activations.append(head_wise_activations[:,-1,:].copy())
And it works.
@Wooorry Thank you! I just added the two copy() in branch master.
It's better to load batch and save batch to avoid overflow.