Major (possibly!!) look needed at node computation

ghost commented 6 years ago

I request that this issue be read completely. A feedback from developers about what they think about it will be very very useful. Are the issues technically correct and should they be a high-priority ? I

There are some issues with node computation. At the least I think that if this behavior is correct, a clarification is needed. I will exhibit them with some examples. At the outset let us consider a graph defined as follows :

import cntk as C
# Create the network.
def create_network(num_channels=3, image_height=221, image_width=221,
                            num_classes=1000):
    # Input variables denoting the features and label data
    feature_var = C.input_variable((num_channels, image_height, image_width), name='raw_images')
    label_var = C.input_variable(num_classes)

    feature_var_norm = C.element_divide(feature_var, [127.5])
    normalized_images = C.minus(feature_var_norm, [1.], name='normalized_image')

    z = C.layers.Sequential([
    # A set of nodes are defined using Layers API
    ])(normalized_images)

    # loss and metric
    ce = C.cross_entropy_with_softmax(z, label_var)
    pe = C.classification_error(z, label_var)
    pe5 = C.classification_error(z, label_var, topN=5)

    C.logging.log_number_of_parameters(z)
    print()

    return {
        'feature': feature_var,
        'label': label_var,
        'ce': ce,
        'pe': pe,
        'pe5': pe5,
        'output': z
    }

Issue Number 1 :

At the time of debugging of one's code someone might want to extract the input (normalized_image) as it goes into the network. Now if in the training code, the following is written, it does not work -

 train_input_map = {
        network['feature']: train_source.streams.features,
        network['label']: train_source.streams.labels
    }
output = trainer.train_minibatch(data, [network['output'].find_by_name('normalized_image')])
print(output[1])

The output provides a long error the summary of which is the following - RuntimeError: GetEvalOrder: Called without prior call to FormEvalOrder() for Minus13 Minus operation

This essentially tells me that without computing the Minus operation, I am asking for computing the normalized_images.

Isn't it supposed to compute all the nodes the output is dependent on, especially when the base input of raw_images is being provided to it ?

I think it is important to clarify it as in advanced use cases it can be extremely troubling.

Issue Number 2. As of yet the computation of multiple evaluation metrics is not exposed in the Python API and I am working on it. However as a workaround one may try out the following -

In the network definition network['pe5'] is supposed to compute the Top-5 error . Considering the documentation of Trainer.train_minibatch, one might want to put network['pe5'] as a possible fetch and hence get its value while manually computing the running average of it.

Here is what happens , ValueError: One of the requested outputs 'Output('ClassificationError3478_Output_0', [], [])' for the Function 'Composite(Combine): Input('raw_images', [#], [3 x 221 x 221]), Input('Input4', [#], [1000]) -> Output('fc8', [#], [1000]), Output('Block3447_Output_0', [#], [1]), Output('aggregateLoss', [], []), Output('Block3468_Output_0', [#], [1]), Output('aggregateEvalMetric', [], [])' forward computation is not part of the graph underlying the Function. This is extremely counter-intuitive. network['pe5'] depends upon z and label_var, both of which are being fed via train_minibatch(). Hence it just needs to compute network['pe5'] and return the result.

I can accept that strictly speaking it is not part of the graph. But then the request to compute network['pe5'] is not such an unusual one since it depends on the quantities being fed to thetrainer_minibatch()`

cha-zhang commented 6 years ago

Hi @calledbymountains ,

Sorry for the confusion. The Train API is not meant for this purpose. To examine what's the output of normalized_images, you can do something like this (untested code):

import cntk as C
# Create the network.
def create_network(num_channels=3, image_height=221, image_width=221,
                            num_classes=1000):
    # Input variables denoting the features and label data
    feature_var = C.input_variable((num_channels, image_height, image_width), name='raw_images')
    label_var = C.input_variable(num_classes)

    feature_var_norm = C.element_divide(feature_var, [127.5])
    normalized_images = C.minus(feature_var_norm, [1.], name='normalized_image')

model=create_network()
evaluation_reader=MinibatchSource(...) # your reader 
input_map=... # your input map 

mb_eval = evaluation_reader.next_minibatch(1, input_map=input_map)
_,fv=model.forward(mb_eval, keep_for_backward={model.output})

Some additional debugging tricks can be found here: https://cntk.ai/pythondocs/Manual_How_to_debug.html

cha-zhang commented 6 years ago

@eldakms @tangyuq , please help.

microsoft / CNTK

Major (possibly!!) look needed at node computation #2537