keras-team / autokeras

AutoML library for deep learning
Apache License 2.0
9.15k stars 1.4k forks source link

Binary Classification yields Runtime Error #678

Closed christian-steinmeyer closed 5 years ago

christian-steinmeyer commented 5 years ago

I am struggling with a relatively easy binary classification problem, similar as in #613 . I'm trying to use the MlpModule of Autokeras to generate Multi-Layer-Perceptron architectures. I apply data transformations to guarantee the correct format: I've tried two possibilities: One output node and binary target labels and two output nodes and one-hot-encoded target labels. Both yield the same / similar results: An error message about the input not being in adequate range.

This is my code:

    from keras.utils import to_categorical

    from autokeras import MlpModule
    from autokeras.nn.metric import Accuracy
    from autokeras.backend.torch import DataTransformerMlp
    from torch.nn.modules.loss import BCELoss

    def binary_cross_entropy(prediction, target):
        return BCELoss()(prediction, target.float())

    # X, test_X, y, test_y are set with Pandas (this is why, below '.values' is used)
    y = to_categorical(y)
    test_y = to_categorical(test_y)

    mlpModule = MlpModule(loss=binary_cross_entropy, metric=Accuracy, searcher_args={}, verbose=True)
    data_transformer = DataTransformerMlp(X.values)
    train_data = data_transformer.transform_train(X.values, y)
    test_data = data_transformer.transform_test(test_X.values, test_y)
    fit_args = {
        "n_output_node": 2,
        "input_shape": X.values.shape,
        "train_data": train_data,
        "test_data": test_data
                  time_limit=1 * 60 * 60)

    # ...

Upon execution, I get the following error message:

RuntimeError: Assertion `x >= 0. && x <= 1.' failed. input value should be between 0~1, but got -0.268241 at ..\aten\src\THNN/generic/BCECriterion.c:62

For the sake of completeness: Here is the full log:

C:\ProgramData\Miniconda\envs\nnmp\python.exe C:\Users\{user}\.IntelliJIdea2018.2\config\plugins\python\helpers\pydev\ --multiproc --qt-support=auto --client --port 50878 --file "C:/Users/{user}/Documents/Academics/Uni/Georg-August-University/Computer Science/Masterarbeit/code/src/"
pydev debugger: process 8364 is connecting

Connected to pydev debugger (build 182.4505.22)
Using TensorFlow backend.
Better speed can be achieved with apex installed from
Saving Directory: C:\Users\{user}\AppData\Local\Temp\autokeras_5OBEEF
C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\backend\torch\ RuntimeWarning: invalid value encountered in true_divide
  data = (data - self.mean) / self.std
C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\backend\torch\ RuntimeWarning: divide by zero encountered in true_divide
  data = (data - self.mean) / self.std

Initializing search.
Initialization finished.

|               Training model 0               |
Backend Qt5Agg is interactive backend. Turning interactive mode on.
Epoch-1, Current Metric - 0:   0%|                                       | 0/14 [00:00<?, ? batch/s]Traceback (most recent call last):
  File "C:\Users\{user}\.IntelliJIdea2018.2\config\plugins\python\helpers\pydev\", line 1664, in <module>
  File "C:\Users\{user}\.IntelliJIdea2018.2\config\plugins\python\helpers\pydev\", line 1658, in main
    globals =['file'], None, None, is_module)
  File "C:\Users\{user}\.IntelliJIdea2018.2\config\plugins\python\helpers\pydev\", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Users\{user}\.IntelliJIdea2018.2\config\plugins\python\helpers\pydev\_pydev_imps\", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/{user}/Documents/Academics/Uni/Georg-August-University/Computer Science/Masterarbeit/code/src/", line 33, in <module>
    time_limit=1 * 60 * 60)
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\", line 69, in fit, test_data, int(time_remain))
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\", line 162, in search
    self.sp_search(graph, other_info, model_id, train_data, test_data)
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\", line 196, in sp_search
    self.metric, self.loss, self.verbose, self.path)
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\", line 363, in train
    raise e
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\", line 356, in train
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\backend\torch\", line 109, in train_model
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\autokeras\backend\torch\", line 146, in _train
    loss = self.loss_function(outputs, targets)
  File "C:/Users/{user}/Documents/Academics/Uni/Georg-August-University/Computer Science/Masterarbeit/code/src/", line 11, in binary_cross_entropy
    return BCELoss()(prediction, target.float())
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\torch\nn\modules\", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\torch\nn\modules\", line 512, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "C:\ProgramData\Miniconda\envs\nnmp\lib\site-packages\torch\nn\", line 2113, in binary_cross_entropy
    input, target, weight, reduction_enum)
RuntimeError: Assertion `x >= 0. && x <= 1.' failed. input value should be between 0~1, but got -0.268241 at ..\aten\src\THNN/generic/BCECriterion.c:62

My labels look as following:

'count    1692.000000
mean        0.500000
std         0.500148
min         0.000000
25%         0.000000
50%         0.500000
75%         1.000000
max         1.000000
Name: y, dtype: float64'

My consists of over 700 columns, each similarly structured to the following 5 examples:

'count    1692.000000
mean        0.037825
std         0.761111
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max        22.000000
Name: [feature1], dtype: float64'

'count    1692.000000
mean        0.037234
std         0.746628
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max        22.000000
Name: [feature2], dtype: float64'

'count    1692.000000
mean        0.000000
std         0.270794
min        -5.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         7.000000
Name: [feature3], dtype: float64'
stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.