microsoft / ELL

Embedded Learning Library
https://microsoft.github.io/ELL
Other
2.28k stars 295 forks source link

wrap.py: WrapException #169

Closed julian0001 closed 6 years ago

julian0001 commented 6 years ago

After I have successfully built the model.ell like in the tutorial (https://microsoft.github.io/ELL/tutorials/Repurposing-a-pretrained-image-classifier/) I get the following output by trying to compile the model:

(py36) C:\Users\Julian\Documents\ELL>python ELL.git\trunk\tools\wrap\wrap.py model.ell --language python --target host compiling model... command C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile failed with error code 1 ### WrapException: <class 'buildtools.EllBuildToolsRunException'>: C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true

--> with --verbose (linux bash): (py36) julian@JS:/mnt/c/Users/Julian/Documents/ELL/transfer_learning$ python ../ELL.git/v2.3.5/tools/wrap/wrap.py model.ell --language python --target host --verbose copy "/mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/CMake/OpenBLASSetup.cmake" "host/OpenBLASSetup.cmake" copy "/mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/interfaces/common/include/CallbackInterface.h" "host/include/CallbackInterface.h" copy "/mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/interfaces/common/tcc/CallbackInterface.tcc" "host/tcc/CallbackInterface.tcc" compiling model... /mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/build/bin/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true exception: Input and output active area sizes don't match command /mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/build/bin/compile failed with error code 1

WrapException: <class 'buildtools.EllBuildToolsRunException'>: /mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/build/bin/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true

--> with --verbose (win cmd): (py36) C:\Users\Julian\Documents\ELL\transfer_learning>python ..\ELL.git\trunk\tools\wrap\wrap.py model.ell --language python --target host --verbose copy "C:\Users\Julian\Documents\ELL\ELL.git\trunk\CMake/OpenBLASSetup.cmake" "host\OpenBLASSetup.cmake" copy "C:\Users\Julian\Documents\ELL\ELL.git\trunk\interfaces/common/include/CallbackInterface.h" "host\include\CallbackInterface.h" copy "C:\Users\Julian\Documents\ELL\ELL.git\trunk\interfaces/common/tcc/CallbackInterface.tcc" "host\tcc\CallbackInterface.tcc" compiling model... C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true exception: Error: couldn't read file: Failed to match field size, instead found token 'layout' command C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile failed with error code 1

WrapException: <class 'buildtools.EllBuildToolsRunException'>: C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true

But the other models I can compile normally. Only the transfer-learned model does not work up to now.

Here is the output of the training:

(py36) julian@JS:/mnt/c/Users/Julian/Documents/ELL/transfer_learning$ ../ELL.git/v2.3.5/build/bin/retargetTrainer --maxEpochs 100 --multiClass true --refineIterations 1 --verbose --inputModelFilename pretrained.ell --targetPortElements 1442.output --inputDataFilename fruit_train.gsdf --outputModelFilename model.ell Current parameters for retargetTrainer --inputModelFilename: pretrained.ell --outputModelFilename: model.ell --refineIterations: 1 (default) --targetPortElements: 1442.output --removeLastLayers: 0 (default) --inputDataFilename: fruit_train.gsdf --multiClass: true --normalize: false (default) --regularization: 0.005 (default) --desiredPrecision: 1e-05 (default) --maxEpochs: 100 --permute: true (default) --randomSeedString: ABCDEFG (default) --verbose: true --lossFunction: log (default) --blas: true (default) --help: false (default)

Loading model from pretrained.ell(3420 ms) Redirected output for port elements 1442.output from model Loading data ...(67 ms)

Transforming dataset with compiled model...(7658 ms)

Creating datasets for One vs Rest...(0 ms)

=== Training binary classifier for class 0 vs Rest === Created linear trainer ... Training ... Primal Objective Dual Objective Duality gap 0.992236 0.000172 0.992064 0.003135 0.000224 0.002911 0.000607 0.000250 0.000356 0.000366 0.000259 0.000107 0.000269 0.000262 0.000006 Final duality Gap: 0.000006

ErrorRate Precision Recall F1-Score AUC MeanLoss 1.000000 0.000000 0.000000 0.000000 0.000000 0.693147 0.000000 1.000000 1.000000 1.000000 1.000000 0.000054

Training completed successfully.

=== Training binary classifier for class 1 vs Rest === Created linear trainer ... Training ... Primal Objective Dual Objective Duality gap 0.000585 0.000177 0.000408 0.001075 0.000186 0.000889 0.000223 0.000196 0.000027 0.000215 0.000199 0.000016 0.000214 0.000200 0.000013 0.000202 0.000201 0.000002 Final duality Gap: 0.000002

ErrorRate Precision Recall F1-Score AUC MeanLoss 1.000000 0.000000 0.000000 0.000000 0.000000 0.693147 0.000000 1.000000 1.000000 1.000000 1.000000 0.000044

Training completed successfully.

=== Training binary classifier for class 2 vs Rest === Created linear trainer ... Training ... Primal Objective Dual Objective Duality gap 0.450262 0.000177 0.450085 0.002117 0.000228 0.001889 0.000326 0.000237 0.000089 0.000302 0.000240 0.000062 0.000258 0.000242 0.000016 0.000246 0.000243 0.000003 Final duality Gap: 0.000003

ErrorRate Precision Recall F1-Score AUC MeanLoss 1.000000 0.000000 0.000000 0.000000 0.000000 0.693147 0.000000 1.000000 1.000000 1.000000 1.000000 0.000062

Training completed successfully. Training completed ...(173 ms)

RetargetTrainer completed... (13485 ms)

New model saved as model.ell

lovettchris commented 6 years ago

Hi, this indicates that the .ell file on the gallery is out of date, and we need to "reimport" the .cntk model using the latest ELL bits. Sometimes we break our ELL file format, sorry about that. I have filed an internal request to get this done, but in the meantime you can run the cntk importer yourself, like this:

curl --location -o pretrained.cntk.zip https://github.com/Microsoft/ELL-models/raw/master/models/ILSVRC2012/dsf_I64x64x3CCMCCMCCMCMCMC1AS/dsf_I64x64x3CCMCCMCCMCMCMC1AS.cntk.zip
unzip pretrained.cntk.zip
python %ELL_ROOT%\tools\importers\CNTK\cntk_import.py dsf_I64x64x3CCMCCMCCMCMCMC1AS.cntk
copy dsf_I64x64x3CCMCCMCCMCMCMC1AS.ell pretrained.ell
julian0001 commented 6 years ago

Hallo @lovettchris , thank you for your advice, but also with the latest ELL bits the wrap.py does not work for the new generated model.ell. I have tried it with one, two or three classes... I get always the same WrapException.

Maybe there is a bug in the module "ActivationLayerNode"?

(py36) C:\Users\Julian\Documents\ELLext\transfer_learning>python %ELL_ROOT%/tools/wrap/wrap.py model.ell --language python --target host --verbose copy "C:\Users\Julian\Documents\ELLext\ELL\CMake/OpenBLASSetup.cmake" "host\OpenBLASSetup.cmake" copy "C:\Users\Julian\Documents\ELLext\ELL\interfaces/common/include/CallbackInterface.h" "host\include\CallbackInterface.h" copy "C:\Users\Julian\Documents\ELLext\ELL\interfaces/common/tcc/CallbackInterface.tcc" "host\tcc\CallbackInterface.tcc" compiling model... C:/Users/Julian/Documents/ELLext/ELL/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true exception: Input and output active area sizes don't match command C:/Users/Julian/Documents/ELLext/ELL/build/bin/release/compile failed with error code 1

WrapException: <class 'buildtools.EllBuildToolsRunException'>: C:/Users/Julian/Documents/ELLext/ELL/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true

lovettchris commented 6 years ago

can you zip up and attach the sample training data you are using so I can reproduce the problem?

julian0001 commented 6 years ago

Yes of course. Attached the sample dataset and the generated model.ell with its .gsdf - files.

transfer_learning.zip data.zip

julian0001 commented 6 years ago

Hallo @lovettchris , have you tried out my sample dataset already?

lovettchris commented 6 years ago

Yes, thanks for the data, I can reproduce the bug. Here’s the scoop, the team has been working on improving how Port MemoryLayout is managed throughout the ELL stack, and this is where the bug was introduced.

If you sync your git repo back to this commit:

 git checkout c9e2a268c51e2aef0715eb270f7a38b3741b3a54

then rebuild ELL, you will get a version that works properly with the retargeting tutorial.

We are working on a fix, but it will take a couple days to get it fully tested and pushed to github.

julian0001 commented 6 years ago

Great thank you, it works finally 👍