keras-team / autokeras

AutoML library for deep learning
http://autokeras.com/
Apache License 2.0
9.1k stars 1.4k forks source link

Bug: ValueError: The dataset should at least contain 2 batches to be split. #1871

Open gloglo17 opened 1 year ago

gloglo17 commented 1 year ago

Hi, I'm just starting with Autokeras, trying out the tutorial example with my dataset but Autokeras doesn't even start. I get this error: ValueError: The dataset should at least contain 2 batches to be split.

Python 3.10.7, Autokeras 1.1.0, Keras 2.12.0, Tensorflow 2.12.0, Pandas 1.5.3, Numpy 1.23.5

Here's the whole code including link to download the dataset.

import pandas as pd
import tensorflow as tf
import autokeras as ak
import numpy as np
import os
import requests
import io

fileName = "0183_SPORT5_limit_10_from_2023-03-29-13-33.npy"
url = f"http://34.28.182.138/{fileName}"

response = requests.get(url)
response.raise_for_status()
data = np.load(io.BytesIO(response.content))

DF = pd.DataFrame(data)
DF.to_csv("data.csv")

clf = ak.StructuredDataClassifier(
    overwrite=True, max_trials=3
)  
clf.fit('data.csv', '1', epochs=10)

Traceback (most recent call last):
  File "/aux/autokera.py", line 24, in <module>
    clf.fit('data.csv', '1', epochs=10)
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 326, in fit
    history = super().fit(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 139, in fit
    history = super().fit(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/auto_model.py", line 288, in fit
    dataset, validation_data = data_utils.split_dataset(
  File "/home/au/.local/lib/python3.10/site-packages/autokeras/utils/data_utils.py", line 46, in split_dataset
    raise ValueError(
ValueError: The dataset should at least contain 2 batches to be split.```
ShahzebL commented 11 months ago

Hi,

Not sure if you're still encountering this issue. I tried checking out your dataset, but couldn't access it. Are there enough samples in data? Another option if working with small sample sizes is to decrease batch_size significantly in the fit method.

Hope this helps.

rahmatiangit commented 6 months ago

- Problem: I get this error if the dataset has 41 or fewer rows. There is no error when the data set is 42 or higher!

- Fix: The Following change in batch_size fixes this problem: default is 32 search.fit(x=X_train, y=y_train, verbose=0, epochs=10, batch_size=12)

- Details: Autokeras 1.1, Code, and passing/failing data sets are attached.

code

url = 'auto-insurance_41a.csv' dataframe = read_csv(url, header=None) print(dataframe.shape)

split into input and output elements

data = dataframe.values data = data.astype('float32') X, y = data[:, :-1], data[:, -1] print(X.shape, y.shape)

separate into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error') search.fit(x=X_train, y=y_train, verbose=0, epochs=10)

code

error

Reloading Tuner from ./structured_data_regressor/tuner0.json


ValueError Traceback (most recent call last)

in <cell line: 11>() 9 search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error') 10 # perform the search ---> 11 search.fit(x=X_train, y=ytrain, verbose=0, epochs=10) 12 # evaluate the model 13 mae, = search.evaluate(X_test, y_test, verbose=0)

2 frames

/usr/local/lib/python3.10/dist-packages/autokeras/tasks/structured_data.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs) 137 self.check_in_fit(x) 138 --> 139 history = super().fit( 140 x=x, 141 y=y,

/usr/local/lib/python3.10/dist-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, verbose, **kwargs) 286 # Split the data with validation_split. 287 if validation_data is None and validation_split: --> 288 dataset, validation_data = data_utils.split_dataset( 289 dataset, validation_split 290 )

/usr/local/lib/python3.10/dist-packages/autokeras/utils/data_utils.py in split_dataset(dataset, validation_split) 44 numinstances = dataset.reduce(np.int64(0), lambda x, : x + 1).numpy() 45 if num_instances < 2: ---> 46 raise ValueError( 47 "The dataset should at least contain 2 batches to be split." 48 )

ValueError: The dataset should at least contain 2 batches to be split.

Error

auto-insurance_41a.csv auto-insurance_42a.csv