microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.53k stars 4.28k forks source link

Reading true labels in evaluation, getting warning: "converting Value object to CSR format might be slow" #2584

Open DragomirYankov opened 7 years ago

DragomirYankov commented 7 years ago

I have a sequence model which I try to evaluate. It uses standard CTF reader in sparse format:

def create_reader(path, randomize, is_training): return C.io.MinibatchSource(C.io.CTFDeserializer(path, C.io.StreamDefs( labels=C.io.StreamDef(field='label', shape=num_labels, is_sparse=True), features = C.io.StreamDef(field='features', shape=num_features, is_sparse=True) )), randomize=randomize, max_sweeps=C.io.INFINITELY_REPEAT if is_training else 1)

z = C.load_model(model_file) x = C.sequence.input_variable(input_dim) y = C.sequence.input_variable(label_dim) reader = create_reader(test_file, randomize=False, is_training=False) input_map = {x: reader.streams.features, y: reader.streams.labels} batch = reader.next_minibatch(batch_size, input_map) predLbl = z.eval(batch[x]) trueLbl = batch[y].as_sequences(y)

When calling as_sequences(y) I keep getting the CSR format warning. I also tried creating y as:

y_ = C.sequence.input_variable(label_dim) y = C.onehot(y, 4, sparse_output=False) ... trueLbl = batch[y].asarray()

But I again get warnings including the CSR format one. What is the right way to read the true labels (data is in sparse format)? Or how can I avoid getting the warning?

Thanks!

JashaDroppo commented 7 years ago

As far as I know, there is no way to get sparse CNTK data into a numpy array without getting the warning. If you are able to define the CNTK data as dense, then the code would use more memory, and be less efficient in other ways, but the warning would disappear.