google / automl

Google Brain AutoML
Apache License 2.0
6.18k stars 1.45k forks source link

buffer_size must be greater than zero error when use custom dataset #1200

Open darouwan opened 10 months ago

darouwan commented 10 months ago

When I use my custom dataset to train model from scratch, it output the error:

WARNING:tensorflow:tf.keras.callbacks.experimental.BackupAndRestore endpoint is deprecated and will be removed in a future release. Please use tf.keras.callbacks.BackupAndRestore. W0827 10:55:05.103817 16528 callbacks.py:1888] tf.keras.callbacks.experimental.BackupAndRestore endpoint is deprecated and will be removed in a future release. Please use tf.keras.callbacks.BackupAndRestore. Traceback (most recent call last): File "F:\python_project\automl\efficientnetv2\main_tf2.py", line 312, in app.run(main) File "F:\python_project\automl\venv\lib\site-packages\absl\app.py", line 308, in run _run_main(main, args) File "F:\python_project\automl\venv\lib\site-packages\absl\app.py", line 254, in _run_main sys.exit(main(argv)) File "F:\python_project\automl\efficientnetv2\main_tf2.py", line 278, in main get_dataset(training=True, image_size=image_size, config=config), File "F:\python_project\automl\efficientnetv2\main_tf2.py", line 225, in get_dataset return ds_strategy.distribute_datasets_from_function( File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 1189, in distribute_datasets_from_function return self._extended._distribute_datasets_from_function( # pylint: disable=protected-access File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\distribute\mirrored_strategy.py", line 593, in _distribute_datasets_from_function return input_util.get_distributed_datasets_from_function( File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\distribute\input_util.py", line 132, in get_distributed_datasets_from_function return input_lib.DistributedDatasetsFromFunction( File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1372, in init self.build() File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1393, in build _create_datasets_from_function_with_input_context( File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1875, in _create_datasets_from_function_with_input_context dataset = dataset_fn(ctx) File "F:\python_project\automl\efficientnetv2\datasets.py", line 444, in dataset_fn return self._input_fn( File "F:\python_project\automl\efficientnetv2\datasets.py", line 403, in _input_fn dataset = self.make_source_dataset(current_host, num_hosts) File "F:\python_project\automl\efficientnetv2\datasets.py", line 357, in make_source_dataset dataset = dataset.shuffle(num_files_per_shard, seed=self.shuffle_seed) File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 3961, in shuffle super(DatasetV1, self).shuffle( File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 1531, in shuffle return ShuffleDataset( File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 5016, in init variant_tensor = gen_dataset_ops.shuffle_dataset_v3( File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 7344, in shuffle_dataset_v3 _ops.raise_from_not_ok_status(e, name) File "F:\python_project\automl\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 7209, in raise_from_not_ok_status raise core._status_to_exception(e) from None # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node wrappedShuffleDatasetV3device/job:localhost/replica:0/task:0/device:CPU:0}} buffer_size must be greater than zero. [Op:ShuffleDatasetV3]

The command I am using is: python main_tf2.py --mode=train --model_name=efficientnetv2-s --dataset_cfg=imagenet --model_dir=models --use_tpu=False --data_dir=F:\python_project\datasets\molding_class\train

But if I use embedded dataset, it works well: python main_tf2.py --mode=train --model_name=efficientnetv2-s --dataset_cfg=imagenet --model_dir=models --use_tpu=False

My dataset strusture is like :

. ├── GOOD │   ├── KA12U9314_B9_DOWN.jpg │   ├── KA12UA115_D7_DOWN.jpg │   └── KA12UA188_B11_DOWN.jpg ├── MD07 │   ├── S361M5099_A16_DOWN.jpg │   └── S36QM2288_E3_DOWN.jpg └── MD20 ├── S361M4502_E16_DOWN.jpg └── S369M190A_E1_DOWN.jpg

My environment is: OS: Windows 11 GPU: 3060 12G Tensorflow 2.10 CUDA 11.8

How to solve it? Thanks!

darouwan commented 10 months ago

It my case, it should be related to https://github.com/google/automl/blob/c7392f2bab3165244d1c565b66409fa11fa82367/efficientnetv2/datasets.py#L342-L343

When I debug, the filenames here is empty