microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.51k stars 4.28k forks source link

ValueError: function expects 2 arguments on trainer.py while trying to test CNN model in CNTK 2.6 #3447

Open miguel2488 opened 5 years ago

miguel2488 commented 5 years ago

Hi,

this error will not stop showing up, making impossible to test the model. No problems with the evaluation part, but it's impossible to make it run the test part. I'm using a very similar code from the MNIST tutorial CNN with MNIST Dataset.

Here's my train_test function:

def train_test(train_reader, test_reader, model_func, num_sweeps_to_train_with=10):

    # Instantiate the model function; x is the input (feature) variable 
    # We will scale the input image pixels within 0-1 range by dividing all input value by 255.
    model = model_func(x/255)

    # Instantiate the loss and error function
    loss, label_error = create_criterion_function(model, y)

    # Instantiate the trainer object to drive the model training
    learning_rate = 0.01
    lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)
    learner = C.sgd(z.parameters, lr_schedule)
    trainer = C.Trainer(z, (loss, label_error), [learner])

    # Initialize the parameters for the trainer
    minibatch_size = 512
    num_samples_per_sweep = 100000
    num_minibatches_to_train = (num_samples_per_sweep * num_sweeps_to_train_with) / minibatch_size

    # Map the data streams to the input and labels.
    input_map={
        y  : train_reader.streams.labels,
        x  : train_reader.streams.features
    } 

    # Uncomment below for more detailed logging
    training_progress_output_freq = 100

    # Start a timer
    start = time.time()

    for i in range(0, int(num_minibatches_to_train)):
        # Read a mini batch from the training data file
        data=train_reader.next_minibatch(minibatch_size, input_map=input_map) 
        trainer.train_minibatch(data)
        print_training_progress(trainer, i, training_progress_output_freq, verbose=1)

    # Print training time
    print("Training took {:.1f} sec".format(time.time() - start))

    # Test the model
    test_input_map = {
        y  : test_reader.streams.labels,
        x  : test_reader.streams.features
    }

    # Test data for trained model
    test_minibatch_size = 512
    num_samples = 5000
    num_minibatches_to_test = num_samples // test_minibatch_size

    test_result = 0.0   

    for i in range(num_minibatches_to_test):

        # We are loading test data in batches specified by test_minibatch_size
        # Each data point in the minibatch is a MNIST digit image of 784 dimensions 
        # with one pixel per dimension that we will encode / decode with the 
        # trained model.
        data = test_reader.next_minibatch(test_minibatch_size, input_map=test_input_map)
        eval_error = trainer.test_minibatch(data, device = None)
        test_result = test_result + eval_error

    # Average of evaluation errors of all test minibatches
    print("Average test error: {0:.2f}%".format(test_result*100 / num_minibatches_to_test))

And this is my do_train_test function:

def do_train_test():
    global z
    z = create_model(x)
    reader_train = create_reader(train_file, True, input_dim, num_output_classes)
    reader_test = create_reader(test_file, False, input_dim, num_output_classes)
    train_test(reader_train, reader_test, z)

And now, this is the error it yields after the model is trained successfully:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-0033a8b9f033> in <module>
      7     train_test(reader_train, reader_test, z)
      8 
----> 9 do_train_test()
     10 
     11 

<ipython-input-18-0033a8b9f033> in do_train_test()
      5     reader_train = create_reader(train_file, True, input_dim, num_output_classes)
      6     reader_test = create_reader(test_file, False, input_dim, num_output_classes)
----> 7     train_test(reader_train, reader_test, z)
      8 
      9 do_train_test()

<ipython-input-16-f611522ab1c3> in train_test(train_reader, test_reader, model_func, num_sweeps_to_train_with)
     61         # trained model.
     62         data = test_reader.next_minibatch(test_minibatch_size, input_map=test_input_map)
---> 63         eval_error = trainer.test_minibatch(data, device = None)
     64         test_result = test_result + eval_error
     65 

C:\ProgramData\Anaconda3\lib\site-packages\cntk\train\trainer.py in test_minibatch(self, arguments, device)
    216         if self.evaluation_function:
    217             all_args |= set(self.evaluation_function.arguments)
--> 218         arguments = sanitize_var_map(tuple(all_args), arguments)
    219 
    220         return super(Trainer, self).test_minibatch(arguments, device)

C:\ProgramData\Anaconda3\lib\site-packages\cntk\internal\sanitize.py in sanitize_var_map(op_arguments, arguments, precision, device, extract_values_from_minibatch_data)
    341         if len(op_arguments) > 0:
    342             raise ValueError('function expects %i arguments' %
--> 343                              len(op_arguments))
    344         return {}
    345 

ValueError: function expects 2 arguments

Please, how can i prevent this from happening?? What am i doing wrong.

Thanks in advance

delzac commented 5 years ago

Hi,

In the source code of test_minibatch, it goes like this:

 if arguments is None or isinstance(arguments, (dict, list)) and len(arguments) == 0:
        if len(op_arguments) > 0:
            raise ValueError('function expects %i arguments' %
                             len(op_arguments))
        return {}

In order for the error to be raised, len(arguments) ==0. arguments is basically your variable data that you fed into test_minibatch. you should double check if variable data is correct.

miguel2488 commented 5 years ago

Hi @delzac,

thak you again for your response. This is how i have configured my data variable for train and test:

#for i in range(0, int(num_minibatches_to_train)):
        # Read a mini batch from the training data file
        data=train_reader.next_minibatch(minibatch_size, input_map=input_map) 
        trainer.train_minibatch(data)
        print_training_progress(trainer, i, training_progress_output_freq, verbose=1)

and this is the testing part:

Test the model

test_input_map = {
    y  : test_reader.streams.labels,
    x  : test_reader.streams.features
}

    test_minibatch_size = 512
    num_samples = 10000
    num_minibatches_to_test = num_samples // test_minibatch_size

    test_result = 0.0   

    for i in range(num_minibatches_to_test):

        # We are loading test data in batches specified by test_minibatch_size
        # Each data point in the minibatch is a MNIST digit image of 784 dimensions 
        # with one pixel per dimension that we will encode / decode with the 
        # trained model.
        data = test_reader.next_minibatch(test_minibatch_size, input_map=test_input_map)
        eval_error = trainer.test_minibatch(data)
        test_result = test_result + eval_error

    # Average of evaluation errors of all test minibatches
    print("Average test error: {0:.2f}%".format(test_result*100 / num_minibatches_to_test))```

As you can see, both are very similar. How is it possible that the first is working and not the second?

Have a look at my test data:

`|labels 0 0 0 1 0 0 0 0 0 0 0 |features 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0` 
delzac commented 5 years ago

Did you check the outputs of test_reader.next_minibatch() is correct?

miguel2488 commented 5 years ago

This what i'm getting when i try to check that output:

data = create_reader(test_file, False, input_dim, num_output_classes) out[34]: <cntk.io.MinibatchSource; proxy of <Swig Object of type 'CNTK::MinibatchSourcePtr *' at 0x000000001E55F420> >

same result when i check in on train_reader

delzac commented 5 years ago

Do data.as_sequences() and print it out does everything still work?

miguel2488 commented 5 years ago

Hi @delzac,

this what i'm getting when executing data.as_sequences():

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-24-41715e7a32cc> in <module>
----> 1 data.as_sequences()

C:\ProgramData\Anaconda3\lib\site-packages\cntk\cntk_py.py in <lambda>(self, name)
   3155     __setattr__ = lambda self, name, value: _swig_setattr(self, MinibatchSource, name, value)
   3156     __swig_getmethods__ = {}
-> 3157     __getattr__ = lambda self, name: _swig_getattr(self, MinibatchSource, name)
   3158 
   3159     def __init__(self, *args, **kwargs):

C:\ProgramData\Anaconda3\lib\site-packages\cntk\cntk_py.py in _swig_getattr(self, class_type, name)
     81     if method:
     82         return method(self)
---> 83     raise AttributeError("'%s' object has no attribute '%s'" % (class_type.__name__, name))
     84 
     85 

AttributeError: 'MinibatchSource' object has no attribute 'as_sequences'
delzac commented 5 years ago

data = C.io.MinibatchSource().next_minibatch() data is a currently type dict, check if its correct. d['x'].as_sequences() should give you access to those values.

The point here is this, the original error essentially says "len(arguments) ==0" which means whatever you fed into test_minibatch is empty. So you really have to debug thoroughly.

miguel2488 commented 5 years ago

thank you for your answer. As you said, the MiniBatchSource.next_minibatch() object is dict type. Querying it with d['x'].as_sequences() as you suggested didn't work for me, instead i have to use d['x'] to index by the key i want to extract. But the data is all there, is not an empty object. I don't know what's happening, i tried following the logistic regression and multi layer perceptron tutorials, everything goes fine till the testing part. I just can't test my models.

Here's what the test_reader yields using the next_minibatch method:

read_test(test_file, input_dim, num_output_classes, 512)
out[124]: {features([15104]): MinibatchData(data=Value([297 x 1 x 15104], GPU), samples=297, seqs=297),
 labels([11]): MinibatchData(data=Value([297 x 1 x 11], GPU), samples=297, seqs=297)}
delzac commented 5 years ago

Can you try switch your test_reader with your train_reader? Does it make a difference? If you use train_minibatch instead of test_minibatch is therea difference? As far asi can tell, I can't see any error in your code.

Anyway, immediately after data = test_reader.next_minibatch(), can you run print(len(data))?

miguel2488 commented 5 years ago

Hi @delzac,

i used this modified function to create the test_reader.next_minibatch() object:

Read a CTF formatted text (as mentioned above) using the CTF deserializer from a file

def test_reader(path, is_training, input_dim, num_label_classes):

    labelStream = C.io.StreamDef(field='labels', shape=num_output_classes, is_sparse=False)
    featureStream = C.io.StreamDef(field='features', shape=input_dim, is_sparse=False)
    deserializer = C.io.CTFDeserializer(path, C.io.StreamDefs(labels = labelStream, features = featureStream))
    data = C.io.MinibatchSource(deserializer, randomize = False, max_sweeps = 1).next_minibatch(minibatch_size_in_samples = 512)                     
    return data

Then when i do this:

data = test_reader(test_file, False, input_dim, num_output_classes) print(type(data), len(data) i get this:

dict, 2

I also tried switching test_reader by train_reader in my code as you suggested like this:

 # Test the model
    test_input_map = {
        y  : train_reader.streams.labels,
        x  : train_reader.streams.features
    }

    # Test data for trained model
    test_minibatch_size = 512
    num_samples = 10000
    num_minibatches_to_test = num_samples // test_minibatch_size

    test_result = 0.0   
    path = 'data/out/test.txt'
    for i in range(int(num_minibatches_to_test)):

        # We are loading test data in batches specified by test_minibatch_size
        # Each data point in the minibatch is a MNIST digit image of 784 dimensions 
        # with one pixel per dimension that we will encode / decode with the 
        # trained model.
        test_data = train_reader.next_minibatch(test_minibatch_size, input_map=test_input_map)
        eval_error = trainer.train_minibatch(test_data)
        test_result = test_result + eval_error

    # Average of evaluation errors of all test minibatches
    print("Average test error: {0:.2f}%".format(test_result*100 / num_minibatches_to_test))

And this is what i got:

Minibatch: 4400, Loss: 0.6998, Error: 23.05%
Minibatch: 4500, Loss: 0.6963, Error: 20.70%
Minibatch: 4600, Loss: 0.7128, Error: 20.31%
Training took 877.8 sec
Average test error: 100.00%

I'm totally lost with this, it seems that the test reader is not working properly with train.test_minibatch() and test_reader.nextminibatch() but the test file seems correctly created using my create_reader function, if i just have a look at the files created, they are MinibatchSource objects containing the whole information about features and labels, i just don't know where the problem is.

create_reader(test_file, input_dim, num_output_classes, num_output_classes)
out[23]: <cntk.io.MinibatchSource; proxy of <Swig Object of type 'CNTK::MinibatchSourcePtr *' at 0x0000027CB82122A0> >
miguel2488 commented 5 years ago

Just realised that if i do this:

create_reader(test_file, input_dim, num_output_classes, num_output_classes).next_minibatch(512)

i'm getting this:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-72-7b834874f2d6> in <module>
----> 1 create_reader(test_file, input_dim, num_output_classes, num_output_classes).next_minibatch(512)

C:\ProgramData\Anaconda3\lib\site-packages\cntk\internal\swig_helper.py in wrapper(*args, **kwds)
     67     @wraps(f)
     68     def wrapper(*args, **kwds):
---> 69         result = f(*args, **kwds)
     70         map_if_possible(result)
     71         return result

C:\ProgramData\Anaconda3\lib\site-packages\cntk\io\__init__.py in next_minibatch(self, minibatch_size_in_samples, input_map, device, num_data_partitions, partition_index)
    330                                             minibatch_size_in_samples,
    331                                             num_data_partitions,
--> 332                                             partition_index, device)
    333 
    334         if not mb:

C:\ProgramData\Anaconda3\lib\site-packages\cntk\cntk_py.py in get_next_minibatch(self, *args)
   3179 
   3180     def get_next_minibatch(self, *args):
-> 3181         return _cntk_py.MinibatchSource_get_next_minibatch(self, *args)
   3182 MinibatchSource_swigregister = _cntk_py.MinibatchSource_swigregister
   3183 MinibatchSource_swigregister(MinibatchSource)

RuntimeError: Reached the maximum number of allowed errors while reading the input file (data/out/test.txt).

[CALL STACK]
    > Microsoft::MSR::CNTK::IDataReader::  InitProposals
    - Microsoft::MSR::CNTK::IDataReader::  InitProposals (x6)
    - CreateCompositeDataReader (x5)
    - Microsoft::MSR::CNTK::TracingGPUMemoryAllocator::  operator=
    - std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>   (x2)
    - Microsoft::MSR::CNTK::TracingGPUMemoryAllocator::  operator=
delzac commented 5 years ago

You cannot keep increasing the minibatch size, it is limited to your graphic card's VRAM size. Anyway, i can't help you here, it's just a matter of debugging and you know your own code better than i do.

Base on what i see here, your train_reader works with trainer.train_minibatch, does it work with trainer.test_minibatch? Can you also test is test_reader works with trainer.train_minibatch and trainer.test_minibatch. You can debug from there.

miguel2488 commented 5 years ago

No it doesn't, it is set to trainer.test_minibatch since the very beginning, in the testing part, it has to be like that to test the model. What do you mean by keep the increase minibatch size?? the next_minibatch argument needs a fixed minibatch_size value, i can't leave that empty.

delzac commented 5 years ago

If you keep increasing the minibatch size, your gpu will not have enough memory to do the computation.

miguel2488 commented 5 years ago

Yeah i know that. I'm using the values of the CNTK's MNIST tutorial, that i previously ran withno problems. Anyway, you've already done a lot for me here, i don't want to keep bothering you with this. I will continue checking things out and see if i can fix it. Thank you very much for all your help :)

Helten commented 5 years ago

I had a similar problem with a similar script. Turned out the last batch was empty, so the test loop had one iteration too many.