mxnet to pytorch - Githubissues

johnjwatson commented 3 years ago

Platform (like ubuntu 16.04/win10): debian buster

Python version: 3.6

Source framework with version (like Tensorflow 1.4.1 with GPU): mxnet

Destination framework with version (like CNTK 2.3 with GPU): pytorch

Pre-trained model path (webpath or webdisk path): https://www.dropbox.com/s/akxeqp99jvsd6z7/model-MobileFaceNet-arcface-ms1m-refine-v1.zip?dl=0

I am trying to convert a pretrained model from mxnet to pytorch, but it always seems to fail. So, first I download, unzip the model files and run:

mmconvert -sf mxnet -in model-symbol.json -iw model-0000.params -df pytorch -om pytorch.pth --inputShape 3,112,112 and I get:

weight = self.weight_data.get(source_node.name + "_weight").asnumpy().transpose((1, 0))
AttributeError: 'NoneType' object has no attribute 'asnumpy'

which is the issue described here: https://github.com/microsoft/MMdnn/issues/231

so, I changed the line 408 in mxnet_parser.py to: weight = self.weight_data.get("fc1_weight").asnumpy().transpose((1, 0))

Now, I run again:

mmconvert -sf mxnet -in model-symbol.json -iw model-0000.params -df pytorch -om pytorch.pth --inputShape 3,112,112 and I get:

  File "pytorch.py", line 30, in __init__
    self.conv_2_dw_conv2d = self.__conv(2, name='conv_2_dw_conv2d', in_channels=64, out_channels=4096, kernel_size=(3, 3), stride=(1, 1), groups=64, bias=False)
  File "pytorch.py", line 335, in __conv
    layer.state_dict()['weight'].copy_(torch.from_numpy(__weights_dict[name]['weights']))
RuntimeError: The size of tensor a (4096) must match the size of tensor b (64) at non-singleton dimension 0

I am not sure what it is trying to tell me other than that there seems a size mismatch. I was wondering if anyone has encountered this and have a solution for this?

Also, I get a warning during the conversion:

 UserWarning: You created Module with Module(..., label_names=['softmax_label']) but input with name 'softmax_label' is not found in symbol.list_arguments(). Did you mean one of:
    data
  warnings.warn(msg)

Is this error something I can safely ignore? Sorry, I am VERY new to MXNET.

XiaoXYe commented 3 years ago

Hi @johnjwatson , you can change the code in pytorch_parser.py: Replace this part in emit_Conv function

if IR_node.type == 'DepthwiseConv':
        group = in_channels
        filter *= group

to

if IR_node.type == 'DepthwiseConv':
        group = in_channels
        filter = group

And you can ignore that warning.

johnjwatson commented 3 years ago

OMG @XiaoXYe that solved it - well, I have the model pth and the .py file - Thanks a tonne!!!!

johnjwatson commented 3 years ago

@XiaoXYe I have a follow up question. So, I have the pytorch.py pytorch file and pytorch.pth fileand I am trying to test this but I get:

  File "pytorch.py", line 130, in forward
    self.minusscalar0_second = torch.autograd.Variable(torch.from_numpy(__weights_dict['minusscalar0_second']['value']), requires_grad=False)
NameError: name '_KitModel__weights_dict' is not defined

Would you know how to solve this? I cant seem to find any answers online. :(

XiaoXYe commented 3 years ago

@johnjwatson this __weights_dict should be auto defined in KitModel.__init__() in pytorch.py like this:

class KitModel(nn.Module):

    def __init__(self, weight_file):
        super(KitModel, self).__init__()
        global __weights_dict
        __weights_dict = load_weights(weight_file)
       ...

johnjwatson commented 3 years ago

@XiaoXYe I thought the same as you and I see that it is defined this way:

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import math

__weights_dict = dict()

def load_weights(weight_file):
    if weight_file == None:
        return

    try:
        weights_dict = np.load(weight_file, allow_pickle=True).item()
    except:
        weights_dict = np.load(weight_file, allow_pickle=True, encoding='bytes').item()

    return weights_dict

class KitModel(nn.Module):

    def __init__(self, weight_file):
        super(KitModel, self).__init__()
        global __weights_dict
        __weights_dict = load_weights(weight_file)

        self.conv_1_conv2d = self.__conv(2, name='conv_1_conv2d', in_channels=3, out_channels=64, kernel_size=(3, 3), stride=(2, 2), groups=1, bias=False)
        self.conv_1_batchnorm = self.__batch_normalization(2, 'conv_1_batchnorm', num_features=64, eps=0.0010000000474974513, momentum=0.8999999761581421)
...

but, when I run this on the files (as per the doc):

import torch
import imp
import numpy as np
MainModel = imp.load_source('MainModel', "pytorch.py")

the_model = torch.load("pytorch.pth")
the_model.eval()
print(the_model)

x = np.random.random([112,112,3])
x = np.transpose(x, (2, 0, 1))
print(x.shape)
x = np.expand_dims(x, 0).copy()
print(x.shape)
data = torch.from_numpy(x)
data = torch.autograd.Variable(data, requires_grad = False).float()

predict = the_model(data)
print(predict)

I get:

  File "/home/foo/ve_name/env_name/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "pytorch.py", line 130, in forward
    self.minusscalar0_second = torch.autograd.Variable(torch.from_numpy(__weights_dict['minusscalar0_second']['value']), requires_grad=False)
NameError: name '_KitModel__weights_dict' is not defined

Just to confirm(In case you want to replicate it), I simply download the file from the link, unzip and then do(as per the doc):

mmconvert -sf mxnet -in model-symbol.json -iw model-0000.params -df pytorch -om pytorch.pth --inputShape 3,112,112

.. to generate the the pytorch related files. The resulting full pytorch.py file is here: https://zerobin.net/?a8436f2ae6791499#dhZsFWXc91YpvlHajIqLY74MdeP8pE98E3IELiAD3bw=

johnjwatson commented 3 years ago

@XiaoXYe I think there is a bug with the double underscores of __weights_dict. Due to python name mangling (please see: https://stackoverflow.com/questions/62810436/global-variable-although-defined-errors-out-as-not-defined-in-python), it should be a single underscore. So, when I change it to a single undercsore, the above error dissapears, but now I get:

  File "pytorch.py", line 130, in forward
    self.minusscalar0_second = torch.autograd.Variable(torch.from_numpy(_weights_dict['minusscalar0_second']['value']), requires_grad=False)
KeyError: 'minusscalar0_second'

:(

XiaoXYe commented 3 years ago

@johnjwatson thank you, #863 will fix this bug. I think this maybe related to the way of loading model in pytorch and _weightsdict is empty because it is set in \_init__() and called in forward() when I save and load model and weight_file mauallly, there is no error.

import torch
import imp
import numpy as np
from pytorch import KitModel

model = KitModel("b411e5ef479b4e45b556c879a61f6704.npy")
model.eval()
print(model)

torch.save(model, "pytorch.pth")
the_model = torch.load("pytorch.pth")
x = np.random.random([112,112,3])
x = np.transpose(x, (2, 0, 1))
print(x.shape)
x = np.expand_dims(x, 0).copy()
print(x.shape)
data = torch.from_numpy(x)
data = torch.autograd.Variable(data, requires_grad = False).float()

predict = the_model(data)
print(predict)

this weight_file was deleted by MMdnn in _scipt/convert.py line 113: remove_temp_files(temp_filename) so deleted it and you can get the weight_file and load manually like above We will look into this later

johnjwatson commented 3 years ago

@XiaoXYe yea, the guy on stack (maxfischer) was kind enough to make the PR. I tried your steps and yes, it works!!! many thanks for being so responsive. REALLY appreciate it. ps: great tool btw :)

microsoft / MMdnn

mxnet to pytorch #862