microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Transfer learning example: format of input evaluate image #2345

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi all,

I found something confusing in the transfer learning example given in: https://github.com/Microsoft/CNTK/blob/master/Examples/Image/TransferLearning/TransferLearning.py

At line 158 - 160, an evaluation image is formatted in the order of HWC before fed into the trained network.

compute model output

arguments = {loaded_model.arguments[0]: [hwc_format]}
output = loaded_model.eval(arguments)

However, at line 102, an input image variable is formatted in the order of CHW and then gets connected to the network to be trained image_input = C.input_variable((num_channels, image_height, image_width))

So, in which order should we feed the trained network? For CNTK, the convolutional layer requires the CHW format. Is this example an exception to that since we are using a trained network from perhaps other framework? Thanks.

cha-zhang commented 7 years ago

The image you feed should be in HWC format. CNTK internally will convert to CHW and run the trained network.

ghost commented 7 years ago

Internally it requires something the same shape with the input variable, i.e. CHW. I can't find anywhere the conversion is done. Can you please point me to any reference? Thank you.

cha-zhang commented 7 years ago

Here it is: https://github.com/Microsoft/CNTK/blob/master/Source/Readers/ImageReader/ImageTransformers.h#L176

ghost commented 7 years ago

In the example, the evaluation image is directly fed into the network in a dictionary key-value pair, i.e. {input_variable:[the evaluation image]}. This is different from reading the image with an ImageReader where transformers can be specified ... Can you please point out where the transform is done? The only function interact with the dictionary is eval() function, where its document doesn't mention any automatically conversion like hwc to chw. Do you mean this conversion is implicitly called by eval() function?

cha-zhang commented 7 years ago

The transpose transform is always called. We hide it so user don't need to care about it.

ghost commented 7 years ago

It needs to be documented somewhere. Otherwise the required format is confusing. It asks for chw, so the user might reshape the image themself, cause they dont know somewhere their image is automatically transformed again ... resulting in transformed twice.

cha-zhang commented 7 years ago

Thanks for the suggestion. We have an eval doc here: https://docs.microsoft.com/en-us/cognitive-toolkit/How-do-I-Evaluate-models-in-Python, but it's not mentioning the orders explicitly.

ghost commented 7 years ago

Thank you for that! Since eval() function https://cntk.ai/pythondocs/cntk.ops.functions.html?highlight=eval#cntk.ops.functions.Function.eval is used in other cases and it doesn't do any transformation like this, it would be quite helpful to explain how image is treated differently from general cases.