google / automl

Google Brain AutoML
Apache License 2.0
6.18k stars 1.45k forks source link

Running efficentdet with SKU110k tfrecord converted dataset #1173

Open hchauhan123 opened 1 year ago

hchauhan123 commented 1 year ago

I have converted the SKU110K dataset in to tfrecord format and when I train it on efficeintdet model, it results in below error. I have attached my tfrecord conversion script which I ran on SKU110k dataset. The image paths were provided correctly to the script. I understand similar kind of issue was raised earlier where I suspect there is an issue with my way to converting SKU100K jpg based dataset in to tfrecord format.

=============================================================================

Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1377, in _do_call return fn(*args) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1360, in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: assertion failed: [238] [[{{node parser/Assert/Assert}}]] [[IteratorGetNext]] [[IteratorGetNext/_13281]] (1) INVALID_ARGUMENT: assertion failed: [238] [[{{node parser/Assert/Assert}}]] [[IteratorGetNext]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 586, in tf.app.run(main) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 36, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "main.py", line 458, in main train_estimator.train(input_fn=input_fn, max_steps=max_steps) File "/usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 360, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1186, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/usr/local/lib/python3.8/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1217, in _train_model_default return self._train_with_estimator_spec(estimator_spec, worker_hooks, File "/root/tf/mlclean/torchvision_py/model_garden/TensorFlow/computer_vision/efficientdet/horovod_estimator/estimator.py", line 174, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 782, in run return self._sess.run( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1311, in run return self._sess.run( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1416, in run raise six.reraise(original_exc_info) File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise raise value File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1401, in run return self._sess.run(args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1469, in run outputs = _WrappedSession.run( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/monitored_session.py", line 1232, in run return self._sess.run(args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 967, in run result = self._run(None, fetches, feed_dict, options_ptr, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1190, in _run results = self._do_run(handle, final_targets, final_fetches, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_run return self._do_call(_run_fn, feeds, fetches, targets, options, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1396, in _do_call raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

2 root error(s) found. (0) INVALID_ARGUMENT: assertion failed: [238] [[{{node parser/Assert/Assert}}]] [[IteratorGetNext]] [[IteratorGetNext/_13281]] (1) INVALID_ARGUMENT: assertion failed: [238] [[{{node parser/Assert/Assert}}]] [[IteratorGetNext]] 0 successful operations. 0 derived errors ignored.

sku-tfrec.py.txt