microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.53k stars 4.28k forks source link

Using cntk readers for in memory and disk data #3150

Open suhaspillai opened 6 years ago

suhaspillai commented 6 years ago

Hi, I want to use cntk reader for doing my inference, so that the data can be loaded parallely, while the model is performing the inference on some other batch. Following is the problem

  1. Input1: This can be created before the inference, so can be stored on the disk, I know I can use CTF reader or others to load this input. (2000*1000) dims
  2. Input2: This is created on the fly during inference. (10,2000) dims
  3. Input 3: This is created on the fly during inference. (10,1000) dims

The model takes Input1, Input2, Input3 (All are sparse inputs). Right now, I store Input 1 as a sparse matrix using scipy.sparse.csr_matrix(),

Input2 and Input 3 are created on the fly before feeding the inputs to the network. So, I have Input1,Inpu2,Input3 (For ex - 10 instances of each), then I create a batch (For ex - 3). Then I feed that as an input to the model, get the results. The problem is that for each batch, I have to convert Input1 from scipy.sparse -> dense representation, which takes some time, as a result my inference slows down. Is there a way, where I can use cntk readers for Input1, Input2,Input3, so that the data is loaded in parallel, while the inference is performed on some other batch.

ke1337 commented 6 years ago

You may write your own deserializer following this manual.