msr-fiddle / pipedream

MIT License
379 stars 117 forks source link

Error occurred in profiling #44

Closed gudiandian closed 4 years ago

gudiandian commented 4 years ago

Similar to issue16, error occurred when I tried to profile resnet:

Traceback (most recent call last):
  File "main.py", line 597, in <module>
    main()
  File "main.py", line 311, in main
    os.path.join(args.profile_directory, args.arch))
  File "main.py", line 122, in create_graph
    output = model(input)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 574, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gudiandian/pipedream/profiler/image_classification/models/resnetNew.py", line 93, in forward
    x = torch.flatten(x, 1)
TypeError: flatten(): argument 'input' (position 1) must be Tensor, not TensorWrapper

I have changed my torchvision version to 0.2.1 as suggested in issue16, However, this doesn't remove the error. My pytorch version is 1.6.0. I'm not sure if this torch version works for pipedream or not? Or is there any other problem? Thank you.

deepakn94 commented 4 years ago

Seems like you are trying to profile your custom implementation of the ResNet model? If this is the case, you want to patch the flatten method, similar to this: https://github.com/msr-fiddle/pipedream/blob/master/profiler/torchmodules/torchgraph/graph_creator.py#L157. This is to ensure that we can construct the computation DAG for the given model. At a high level, this method needs to unwrap the TensorWrapper object, call torch.flatten, then rewrap the result as a TensorWrapper object.

gudiandian commented 4 years ago

I use my custom implementation of ResNet because of another error:

Traceback (most recent call last):
  File "main.py", line 596, in <module>
    main()
  File "main.py", line 287, in main
    verbose=args.verbose, device="cuda")
  File "../torchmodules/torchsummary/torchsummary.py", line 77, in summary
    model(*model_input)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 574, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/lib/python3.7/site-packages/torchvision/models/resnet.py", line 149, in forward
    x = self.avgpool(x)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 574, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 554, in forward
    self.padding, self.ceil_mode, self.count_include_pad, self.divisor_override)
RuntimeError: Given input size: (2048x1x1). Calculated output size: (2048x-5x-5). Output size is too small

It is said that this might be caught by wrong parameter of avgpool layer in torchvision.

Thanks for your suggestion! This worked for my situation. I will close this issue.

deepakn94 commented 4 years ago

Yup, that's correct -- you need to use a version of the VGG model that wraps tensors with TensorWrappers