microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Exception when loading legacy model (C++, windows) #3381

Open dmagee opened 6 years ago

dmagee commented 6 years ago

I've previously (for other models) been able load legacy models generated by cntk.exe into c++ programs using:

DeviceDescriptor& device = CNTK::DeviceDescriptor::GPUDevice(0);
const wchar_t* modelname = L"MODEL"; // ImageImageDSRegression2L+1
FunctionPtr rootFunc = Function::Load(modelname, device);

I have a particular (quite big UNET like) model that causes the cntk based programs to throw an exception.

Call stack:

    0000000000000000()  Unknown
>   Cntk.Core-2.5.1d.dll!Microsoft::MSR::CNTK::MBLayout::MBLayout(unsigned __int64 numParallelSequences, unsigned __int64 numTimeSteps, const std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > & name) Line 112    C++
    [External Code] 
    Cntk.Core-2.5.1d.dll!Microsoft::MSR::CNTK::ComputationNetwork::ComputationNetwork() Line 62 C++
    Cntk.Core-2.5.1d.dll!Microsoft::MSR::CNTK::ComputationNetwork::ComputationNetwork(int deviceId) Line 69 C++
    [External Code] 
    Cntk.Core-2.5.1d.dll!CNTK::Internal::LoadLegacyModel(const std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > & modelFile, const CNTK::DeviceDescriptor & computeDevice) Line 589    C++
    Cntk.Core-2.5.1d.dll!CNTK::Function::Load(const std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > & filepath, const CNTK::DeviceDescriptor & computeDevice, CNTK::ModelFormat format) Line 507  C++
    QuantifyByAppearanceFast.exe!QuantifyByAppearance::quantify_sub_image(std::basic_string<char,std::char_traits<char>,std::allocator<char> > image_token, std::basic_string<char,std::char_traits<char>,std::allocator<char> > alignment_id, unsigned int x, unsigned int y, unsigned int w, unsigned int h, unsigned int zoom_level, unsigned int level, double mppx, double mppy, unsigned int overlap_pad_left, unsigned int overlap_pad_right, unsigned int overlap_pad_top, unsigned int overlap_pad_bottom) Line 1089   C++
    QuantifyByAppearanceFast.exe!QuantifyByAppearance::run_code() Line 764  C++
    QuantifyByAppearanceFast.exe!main(int argc, char * * argv) Line 457 C++
    [External Code] 

The exception appears to be at line 112 of Sequences.h:

MBLayout(size_t numParallelSequences, size_t numTimeSteps, const std::wstring &name)
        : m_distanceToStart(CPUDEVICE), m_distanceToEnd(CPUDEVICE), m_columnsValidityMask(CPUDEVICE)

Unhandled exception at 0x0000000000000000 in QuantifyByAppearanceFast.exe: 0xC0000005: Access violation executing location 0x0000000000000000.

I'm using cntk 2.5, windows 10, and vs2017 (cntk compiled from source). The model is at:

https://drive.google.com/open?id=1qMNatDi8G_JAzaoITCMF4gKEFYNrflhj

The model works fine with cntk.exe on the same machine (quad i7, 16gb, nvidia 1080).

Thanks for any help!

D.

dmagee commented 6 years ago

Update: the error reported above changes when more DLLs are copied (I'd only copied the core dll for the debug build). Once that's done the real error is:

Assertion failed: convolutionMapVar.IsConstant() || convolutionMapVar.IsParameter(), file c:\repos\cntk\source\cntkv2librarydll\backcompat.cpp, line 364

I think this is probably down to the way the 3x3 convolution kernel is constructed in the brainscript of this network (which is not in the networks that load)?

# Convolve with '1 1 0
#                1 1 0
#                0 0 0'
W1 = ConstantTensor (1, (2:2))
WP1 = ConstantTensor (0, (2:1))
WP2 = ConstantTensor (0, (1:3))
WW=Splice(W1:WP1, axis=2)
W=Splice(WW:WP2, axis=1)

c1 = Convolution (W, up_fake, (3:3:1), mapDims = 1, stride = 1, sharing = true, autoPadding = true, lowerPad = 0, upperPad = 0,  imageLayout = "cudnn")

W indeed is not a constant or parameter explicitly, but is a perfectly valid kernel that works fine in cntk.exe. I've confirmed this is the issue with a few print statements in the cntk code.

ke1337 commented 6 years ago

There are some limitations in loading legacy models, and this is one of them unfortunately. I think the workaround is to edit the model and replace W with a constant.

dmagee commented 6 years ago

The reason it was like that as I could not find a way of creating a 2D constant with varying values. Hence the concatenation of several constant valued constants.