keras-team / autokeras

AutoML library for deep learning
http://autokeras.com/
Apache License 2.0
9.11k stars 1.39k forks source link

Autokeras training in Jupyter Notebook stays at 0% forever (restarting kernel does not fix it) #831

Closed lrsoenksen closed 4 years ago

lrsoenksen commented 4 years ago

Bug Description

Autokeras 0.4 get's stuck at 0% when trying to train on Jupyter notebook. The same code runs well in terminal. I've tried with the example code below, using various time_limit values. The process does not generate an error but never advances, and re-starting the kernel does not help as implied by other bug threads. When I stop the kernel I get the error shown also below:

To Reproduce issue (3-steps)

Steps to reproduce the behavior:

from keras.datasets import mnist
from autokeras.image.image_supervisºed import ImageClassifier

if __name__ == '__main__':
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = x_train.reshape(x_train.shape + (1,))
    x_test = x_test.reshape(x_test.shape + (1,))

    clf = ImageClassifier(verbose=True)
    clf.fit(x_train, y_train, time_limit=12 * 60 * 60)
    clf.final_fit(x_train, y_train, x_test, y_test, retrain=True)
    y = clf.evaluate(x_test, y_test)
    print(y)

Behavior

The training stays forever at 0%, with following output

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 2s 0us/step
Saving Directory: /var/folders/zg/14r0611j26560p1lp_spx6_80000gn/T/autokeras_H7XG2Y
Preprocessing the images.
Preprocessing finished.

Initializing search.
Initialization finished.

+----------------------------------------------+
|               Training model 0               |
+----------------------------------------------+
Epoch-1, Current Metric - 0:   0%|                                      | 0/465 [00:00<?, ? batch/s]
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-2-5dfa6260af29> in <module>
      8 
      9     clf = ImageClassifier(verbose=True)
---> 10     clf.fit(x_train, y_train, time_limit=12)
     11     clf.final_fit(x_train, y_train, x_test, y_test, retrain=True)
     12     y = clf.evaluate(x_test, y_test)

~/opt/anaconda3/envs/py36/lib/python3.6/site-packages/autokeras/image/image_supervised.py in fit(self, x, y, time_limit)
    153             print("Preprocessing finished.")
    154 
--> 155         super().fit(x, y, time_limit)
    156 
    157     def init_transformer(self, x):

~/opt/anaconda3/envs/py36/lib/python3.6/site-packages/autokeras/supervised.py in fit(self, x, y, time_limit)
    185             time_limit = 24 * 60 * 60
    186 
--> 187         self.cnn.fit(self.get_n_output_node(), x_train.shape, train_data, valid_data, time_limit)
    188 
    189     def final_fit(self, x_train, y_train, x_test, y_test, trainer_args=None, retrain=False):

~/opt/anaconda3/envs/py36/lib/python3.6/site-packages/autokeras/net_module.py in fit(self, n_output_node, input_shape, train_data, test_data, time_limit)
     67         try:
     68             while time_remain > 0:
---> 69                 self.searcher.search(train_data, test_data, int(time_remain))
     70                 pickle_to_file(self, os.path.join(self.path, 'module'))
     71                 if len(self.searcher.history) >= Constant.MAX_MODEL_NUM:

~/opt/anaconda3/envs/py36/lib/python3.6/site-packages/autokeras/search.py in search(self, train_data, test_data, timeout)
    163         else:
    164             # Use two processes
--> 165             self.mp_search(graph, other_info, model_id, train_data, test_data)
    166 
    167     def mp_search(self, graph, other_info, model_id, train_data, test_data):

~/opt/anaconda3/envs/py36/lib/python3.6/site-packages/autokeras/search.py in mp_search(self, graph, other_info, model_id, train_data, test_data)
    173             p.start()
    174             search_results = self._search_common(q)
--> 175             metric_value, loss, graph = q.get(block=True)
    176             if time.time() >= self._timeout:
    177                 raise TimeoutError

~/opt/anaconda3/envs/py36/lib/python3.6/multiprocessing/queues.py in get(self, block, timeout)
     92         if block and timeout is None:
     93             with self._rlock:
---> 94                 res = self._recv_bytes()
     95             self._sem.release()
     96         else:

~/opt/anaconda3/envs/py36/lib/python3.6/multiprocessing/connection.py in recv_bytes(self, maxlength)
    214         if maxlength is not None and maxlength < 0:
    215             raise ValueError("negative maxlength")
--> 216         buf = self._recv_bytes(maxlength)
    217         if buf is None:
    218             self._bad_message_length()

~/opt/anaconda3/envs/py36/lib/python3.6/multiprocessing/connection.py in _recv_bytes(self, maxsize)
    405 
    406     def _recv_bytes(self, maxsize=None):
--> 407         buf = self._recv(4)
    408         size, = struct.unpack("!i", buf.getvalue())
    409         if maxsize is not None and size > maxsize:

~/opt/anaconda3/envs/py36/lib/python3.6/multiprocessing/connection.py in _recv(self, size, read)
    377         remaining = size
    378         while remaining > 0:
--> 379             chunk = read(handle, remaining)
    380             n = len(chunk)
    381             if n == 0:

KeyboardInterrupt: 

Setup Details

Details about the versions used:


### Additional context
<!---
Please help
-->
taylor4712 commented 4 years ago

Maybe you can try to run it on google colab?

lrsoenksen commented 4 years ago

I've found a fix within Jupyter. I had to add this to the imports to allow for the multiprocessing of Autokeras to report on terminal if for whatever reason Jupyter is causing issues... this anticipates both use in terminal or jupyter so I've added it

# Enable multiprocessing
import multiprocessing
multiprocessing.set_start_method('forkserver')