keras-team / autokeras

AutoML library for deep learning
http://autokeras.com/
Apache License 2.0
9.13k stars 1.4k forks source link

TextClassifier with tensorflow.data.Dataset #982

Closed johny-b closed 4 years ago

johny-b commented 4 years ago

Bug Description

I'm trying to use TextClassifier together with tf.data.Dataset. I use the example from https://autokeras.com/tutorial/text_classification/#data-format and it fails both on autokeras=1.0.0 and autokeras=1.0.1 (with different error messages).

Reproducing Steps

Code that I hope exactly matches intentions of the tutorial:

import numpy as np
from tensorflow.keras.datasets import imdb

index_offset = 3  # word index offset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=1000,
                                                      index_from=index_offset)
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)
word_to_id = imdb.get_word_index()
word_to_id = {k: (v + index_offset) for k, v in word_to_id.items()}
word_to_id["<PAD>"] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2
id_to_word = {value: key for key, value in word_to_id.items()}
x_train = list(map(lambda sentence: ' '.join(
    id_to_word[i] for i in sentence), x_train))
x_test = list(map(lambda sentence: ' '.join(
    id_to_word[i] for i in sentence), x_test))
x_train = np.array(x_train, dtype=np.str)
x_test = np.array(x_test, dtype=np.str)

import autokeras as ak
import tensorflow as tf
train_set = tf.data.Dataset.from_tensor_slices(((x_train, ), (y_train, )))
test_set = tf.data.Dataset.from_tensor_slices(((x_test, ), (y_test, )))

clf = ak.TextClassifier(max_trials=10)
clf.fit(train_set)
predicted_y = clf.predict(test_set)

On 1.0.1 this ends in:

    clf.fit(train_set)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/tasks/text.py", line 116, in fit
    **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/auto_model.py", line 231, in fit
    validation_split=validation_split)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/auto_model.py", line 305, in _prepare_data
    dataset = self._process_xy(x, y, True)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/auto_model.py", line 292, in _process_xy
    x = self._process_x(x, fit)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/auto_model.py", line 257, in _process_x
    data = adapter.fit_transform(data)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/engine/adapter.py", line 69, in fit_transform
    dataset = self.convert_to_dataset(dataset)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/adapters/input_adapter.py", line 64, in convert_to_dataset
    if len(x.shape) == 1:
AttributeError: 'MapDataset' object has no attribute 'shape'

and on 1.0.0:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/kerastuner/engine/hypermodel.py", line 105, in build
    model = self.hypermodel.build(hp)
  File "/usr/local/lib/python3.6/dist-packages/kerastuner/engine/hypermodel.py", line 65, in _build_wrapper
    return self._build(hp, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/hypermodel/graph.py", line 282, in build
    outputs = block.build(hp, inputs=temp_inputs)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/hypermodel/base.py", line 117, in _build_wrapper
    return self._build(hp, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/hypermodel/head.py", line 137, in build
    output_node = tf.keras.layers.Dense(self.output_shape[-1])(output_node)
TypeError: 'NoneType' object is not subscriptable
[Warning] Invalid model 0/5
(...)
    clf.fit(train_set)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/task.py", line 329, in fit
    **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/auto_model.py", line 208, in fit
    **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/tuner.py", line 213, in search
    super().search(hyper_graph=hyper_graph, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/tuner.py", line 144, in search
    super().search(callbacks=new_callbacks, **fit_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kerastuner/engine/base_tuner.py", line 130, in search
    self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/autokeras/tuner.py", line 52, in run_trial
    super().run_trial(trial, **new_fit_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kerastuner/engine/multi_execution_tuner.py", line 95, in run_trial
    model = self.hypermodel.build(trial.hyperparameters)
  File "/usr/local/lib/python3.6/dist-packages/kerastuner/engine/hypermodel.py", line 65, in _build_wrapper
    return self._build(hp, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kerastuner/engine/hypermodel.py", line 115, in build
    'Too many failed attempts to build model.')
RuntimeError: Too many failed attempts to build model.

Expected Behavior

I'd expect it to end without a traceback : )

Setup Details

Include the details about the versions of:

haifeng-jin commented 4 years ago

Thank you for the report. I am working on this recently. Will have a new release soon.

johny-b commented 4 years ago

@haifeng-jin will the new release also have a fix for the bug described here: https://github.com/keras-team/autokeras/issues/946 ? Because if not then TextClassifier will still work only on 1.0.0.

haifeng-jin commented 4 years ago

Yes, according to my test, it is fixed. If it is not, then I can fix it again and release it again.

haifeng-jin commented 4 years ago

It works with the latest master of autokeras and kerastuner