johnmarktaylor91 / torchlens

Package for extracting and mapping the results of every single tensor operation in a PyTorch model in one line of code.

GNU General Public License v3.0

487 stars 17 forks source link

发生异常: RuntimeError The function 'native_batch_norm' is not differentiable with respect to argument 'running_mean'. This input cannot have requires_grad True. File "D:\ProgramCode\torchlens\dubug_torchlens.py", line 46, in <module> y = bn.func_applied(x, buffer1, buffer2, bn.parent_params, bn.func_all_args_non_tensor) RuntimeError: The function 'native_batch_norm' is not differentiable with respect to argument 'running_mean'. This input cannot have requires_grad True. #29

Closed whisperLiang closed 2 months ago

whisperLiang commented 2 months ago

Something meets wrong when I forward with batchnorm layer, Here is the code:

import torch import torchvision import torchlens as tl

model = torchvision.models.resnet18(pretrained=True) x = torch.rand(1, 3, 224, 224) model_history = tl.log_forward_pass(model, x, vis_opt='unrolled')

res = model_log_forward(model_history.layer_list, model_history, x)

bn = model_history[4] x = model_history[1].tensor_contents buffer1 = model_history[2].tensor_contents buffer2 = model_history[3].tensor_contents

buffer1.requires_grad = False

buffer2.requires_grad = False

y = bn.func_applied(x, buffer1, buffer2, bn.parent_params, bn.func_all_args_non_tensor) print(model_history)

johnmarktaylor91 commented 2 months ago

This is very odd. I'm getting the same error. The error message doesn't make any sense since buffer1 and buffer2 clearly have requires_grad set to False. Unfortunately I don't know what's going on, this seems like a PyTorch problem rather than a TorchLens problem. But as a workaround you can do the following:


model_history = tl.log_forward_pass(model, x, vis_opt='none', save_function_args=True)

bn = model_history[4]
bn.func_applied(*bn.creation_args)```

whisperLiang commented 2 months ago

Thanks for your answer, inspired by creation_args, I change buffer place as "y = bn.func_applied(x, bn.parent_params, buffer1, buffer2, bn.func_all_args_non_tensor)". I should keep this usage, because "x" is random, and I forward with "x"