optimize evaluate_accuracy

Dear all,

the function evaluate_accuracy that appears in various examples in the tutorial is written in a suboptimal way, e.g. from deep conv nets:

def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    for d, l in data_iterator:
        data = d.as_in_context(ctx)
        label = l.as_in_context(ctx) # This is unnecessary
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]

Usually in the examples, ctx refers to mx.gpu(), however within the definition of mx.metric.Accuracy we see that both predictions and labels are transformed to numpy arrays (i.e. copies to mx.cpu() context). Therefore, if ctx = mx.gpu() this function definition has unnecessary copies of labels into gpu memory and drops performance.

Thank you for the awesome tutorial you've created! Regards

zackchase / mxnet-the-straight-dope

optimize evaluate_accuracy #541