Closed julian0001 closed 6 years ago
Hi, this indicates that the .ell file on the gallery is out of date, and we need to "reimport" the .cntk model using the latest ELL bits. Sometimes we break our ELL file format, sorry about that. I have filed an internal request to get this done, but in the meantime you can run the cntk importer yourself, like this:
curl --location -o pretrained.cntk.zip https://github.com/Microsoft/ELL-models/raw/master/models/ILSVRC2012/dsf_I64x64x3CCMCCMCCMCMCMC1AS/dsf_I64x64x3CCMCCMCCMCMCMC1AS.cntk.zip
unzip pretrained.cntk.zip
python %ELL_ROOT%\tools\importers\CNTK\cntk_import.py dsf_I64x64x3CCMCCMCCMCMCMC1AS.cntk
copy dsf_I64x64x3CCMCCMCCMCMCMC1AS.ell pretrained.ell
Hallo @lovettchris , thank you for your advice, but also with the latest ELL bits the wrap.py does not work for the new generated model.ell. I have tried it with one, two or three classes... I get always the same WrapException.
Maybe there is a bug in the module "ActivationLayerNode"?
(py36) C:\Users\Julian\Documents\ELLext\transfer_learning>python %ELL_ROOT%/tools/wrap/wrap.py model.ell --language python --target host --verbose copy "C:\Users\Julian\Documents\ELLext\ELL\CMake/OpenBLASSetup.cmake" "host\OpenBLASSetup.cmake" copy "C:\Users\Julian\Documents\ELLext\ELL\interfaces/common/include/CallbackInterface.h" "host\include\CallbackInterface.h" copy "C:\Users\Julian\Documents\ELLext\ELL\interfaces/common/tcc/CallbackInterface.tcc" "host\tcc\CallbackInterface.tcc" compiling model... C:/Users/Julian/Documents/ELLext/ELL/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true exception: Input and output active area sizes don't match command C:/Users/Julian/Documents/ELLext/ELL/build/bin/release/compile failed with error code 1
can you zip up and attach the sample training data you are using so I can reproduce the problem?
Yes of course. Attached the sample dataset and the generated model.ell with its .gsdf - files.
Hallo @lovettchris , have you tried out my sample dataset already?
Yes, thanks for the data, I can reproduce the bug. Here’s the scoop, the team has been working on improving how Port MemoryLayout is managed throughout the ELL stack, and this is where the bug was introduced.
If you sync your git repo back to this commit:
git checkout c9e2a268c51e2aef0715eb270f7a38b3741b3a54
then rebuild ELL, you will get a version that works properly with the retargeting tutorial.
We are working on a fix, but it will take a couple days to get it fully tested and pushed to github.
Great thank you, it works finally 👍
After I have successfully built the model.ell like in the tutorial (https://microsoft.github.io/ELL/tutorials/Repurposing-a-pretrained-image-classifier/) I get the following output by trying to compile the model:
(py36) C:\Users\Julian\Documents\ELL>python ELL.git\trunk\tools\wrap\wrap.py model.ell --language python --target host compiling model... command C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile failed with error code 1 ### WrapException: <class 'buildtools.EllBuildToolsRunException'>: C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true
--> with --verbose (linux bash): (py36) julian@JS:/mnt/c/Users/Julian/Documents/ELL/transfer_learning$ python ../ELL.git/v2.3.5/tools/wrap/wrap.py model.ell --language python --target host --verbose copy "/mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/CMake/OpenBLASSetup.cmake" "host/OpenBLASSetup.cmake" copy "/mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/interfaces/common/include/CallbackInterface.h" "host/include/CallbackInterface.h" copy "/mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/interfaces/common/tcc/CallbackInterface.tcc" "host/tcc/CallbackInterface.tcc" compiling model... /mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/build/bin/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true exception: Input and output active area sizes don't match command /mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/build/bin/compile failed with error code 1
WrapException: <class 'buildtools.EllBuildToolsRunException'>: /mnt/c/Users/Julian/Documents/ELL/ELL.git/v2.3.5/build/bin/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true
--> with --verbose (win cmd): (py36) C:\Users\Julian\Documents\ELL\transfer_learning>python ..\ELL.git\trunk\tools\wrap\wrap.py model.ell --language python --target host --verbose copy "C:\Users\Julian\Documents\ELL\ELL.git\trunk\CMake/OpenBLASSetup.cmake" "host\OpenBLASSetup.cmake" copy "C:\Users\Julian\Documents\ELL\ELL.git\trunk\interfaces/common/include/CallbackInterface.h" "host\include\CallbackInterface.h" copy "C:\Users\Julian\Documents\ELL\ELL.git\trunk\interfaces/common/tcc/CallbackInterface.tcc" "host\tcc\CallbackInterface.tcc" compiling model... C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true exception: Error: couldn't read file: Failed to match field size, instead found token 'layout' command C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile failed with error code 1
WrapException: <class 'buildtools.EllBuildToolsRunException'>: C:/Users/Julian/Documents/ELL/ELL.git/trunk/build/bin/release/compile -imap model.ell -cfn Predict -cmn model --bitcode --target host -od host --fuseLinearOps True --swig --blas true --optimize true
But the other models I can compile normally. Only the transfer-learned model does not work up to now.
Here is the output of the training:
(py36) julian@JS:/mnt/c/Users/Julian/Documents/ELL/transfer_learning$ ../ELL.git/v2.3.5/build/bin/retargetTrainer --maxEpochs 100 --multiClass true --refineIterations 1 --verbose --inputModelFilename pretrained.ell --targetPortElements 1442.output --inputDataFilename fruit_train.gsdf --outputModelFilename model.ell Current parameters for retargetTrainer --inputModelFilename: pretrained.ell --outputModelFilename: model.ell --refineIterations: 1 (default) --targetPortElements: 1442.output --removeLastLayers: 0 (default) --inputDataFilename: fruit_train.gsdf --multiClass: true --normalize: false (default) --regularization: 0.005 (default) --desiredPrecision: 1e-05 (default) --maxEpochs: 100 --permute: true (default) --randomSeedString: ABCDEFG (default) --verbose: true --lossFunction: log (default) --blas: true (default) --help: false (default)
Loading model from pretrained.ell(3420 ms) Redirected output for port elements 1442.output from model Loading data ...(67 ms)
Transforming dataset with compiled model...(7658 ms)
Creating datasets for One vs Rest...(0 ms)
=== Training binary classifier for class 0 vs Rest === Created linear trainer ... Training ... Primal Objective Dual Objective Duality gap 0.992236 0.000172 0.992064 0.003135 0.000224 0.002911 0.000607 0.000250 0.000356 0.000366 0.000259 0.000107 0.000269 0.000262 0.000006 Final duality Gap: 0.000006
ErrorRate Precision Recall F1-Score AUC MeanLoss 1.000000 0.000000 0.000000 0.000000 0.000000 0.693147 0.000000 1.000000 1.000000 1.000000 1.000000 0.000054
Training completed successfully.
=== Training binary classifier for class 1 vs Rest === Created linear trainer ... Training ... Primal Objective Dual Objective Duality gap 0.000585 0.000177 0.000408 0.001075 0.000186 0.000889 0.000223 0.000196 0.000027 0.000215 0.000199 0.000016 0.000214 0.000200 0.000013 0.000202 0.000201 0.000002 Final duality Gap: 0.000002
ErrorRate Precision Recall F1-Score AUC MeanLoss 1.000000 0.000000 0.000000 0.000000 0.000000 0.693147 0.000000 1.000000 1.000000 1.000000 1.000000 0.000044
Training completed successfully.
=== Training binary classifier for class 2 vs Rest === Created linear trainer ... Training ... Primal Objective Dual Objective Duality gap 0.450262 0.000177 0.450085 0.002117 0.000228 0.001889 0.000326 0.000237 0.000089 0.000302 0.000240 0.000062 0.000258 0.000242 0.000016 0.000246 0.000243 0.000003 Final duality Gap: 0.000003
ErrorRate Precision Recall F1-Score AUC MeanLoss 1.000000 0.000000 0.000000 0.000000 0.000000 0.693147 0.000000 1.000000 1.000000 1.000000 1.000000 0.000062
Training completed successfully. Training completed ...(173 ms)
RetargetTrainer completed... (13485 ms)
New model saved as model.ell