Closed alexgarel closed 2 years ago
Hello, I’ve launched the training with the following command (where models is a empty folder i created):
python train.py config.json models
but get the following error:
Epoch 17/50
5465/5465 [==============================] - 1373s 251ms/step - loss: 0.0012 - binary_accuracy: 0.9996 - precision: 0.8874 - recall: 0.7950 - val_loss: 0.0014 - val_binary_accuracy: 0.9996 - val_precision: 0.8894 - val_recall: 0.7894
Training ended
Moving log directory from /var/folders/c7/w4lf4cp91_j_p3dxm9w00rmh0000gn/T/tmpsiomxoa9 to models/logs
Saving the base and the serving model models
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
Evaluating on validation dataset
Traceback (most recent call last):
File "train.py", line 203, in <module>
main()
File "train.py", line 189, in main
train(
File "train.py", line 136, in train
y_pred_val = model.predict(val.map(lambda x,y: x))
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py", line 1751, in predict
tmp_batch_outputs = self.predict_function(iterator)
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 933, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 759, in _initialize
self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3066, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3463, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3298, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1007, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 668, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 994, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py:1586 predict_function *
return step_function(self, iterator)
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py:1576 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:1286 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2849 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:3632 _call_for_each_replica
return fn(*args, **kwargs)
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py:1569 run_step **
outputs = model.predict_step(data)
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py:1537 predict_step
return self(x, training=False)
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/base_layer.py:1020 __call__
input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/input_spec.py:199 assert_input_compatibility
raise ValueError('Layer ' + layer_name + ' expects ' +
ValueError: Layer model expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None) dtype=string>]
Any help will be appreciated, thanks
I would use pdb and look at val.map(lambda x,y: x)
to see what it produces. (next(iter(val.map(lambda x,y: x)))
more precisely.
In create_tf_dataset
, we see from output_signature
that you should have a tuple of tensor. See also Dataset.map
documentation maybe ?
BTW you don't have to do retraining to debug as your model is already trained and saved, you should be able to load it in a console and try the predict function on it.
Using model.predict(val)
should be enough.
The error you get is because predict
has some clever unpacking strategy when given an iterable (Dataset
in our case). See "Unpacking behavior for iterator-like inputs" at https://keras.io/api/models/model_training_apis/#fit-method. It's explained in the fit
doc but also applies to predict
.
What's not obvious from the doc is that you can give predict
the full Dataset and it'll only use the inputs: https://github.com/tensorflow/tensorflow/blob/v2.6.3/tensorflow/python/keras/engine/training.py#L1539.
Once that part is fixed, I run into several other issues with:
report, clf_report = evaluation_report(
val.map(lambda x, y: y).as_numpy(), y_pred_val, taxonomy=category_taxonomy, category_names=category_names
First, as_numpy()
doesn't exist. I managed to fix it using list(val.map(lambda x, y: y).unbatch().as_numpy_iterator())
but I'm not entirely sure it's the best way?
Then category_names
isn't defined. I've tweaked the evaluation_report()
function to take a category_to_id
argument instead and pass it category_vocab
. The category_to_id
parameter is already supported in the fill_ancestors()
function called within evaluation_report()
, so you just have to pass it along.
I'll try to submit a PR tomorrow unless someone beats me to it :)
First,
as_numpy()
doesn't exist. I managed to fix it usinglist(val.map(lambda x, y: y).unbatch().as_numpy_iterator())
but I'm not entirely sure it's the best way?
Maybe np.ndarray(val.map(lambda x, y: y).unbatch().as_numpy_iterator())
simply works ?
Or even better from the second exanple in batch doc, can you do something like :
next(val.map(lambda x, y: y).batch(100000000).as_numpy_iterator())
(if possible replace 100000000 by dataset size + 1 )
Otherwise there is a as_numpy
function in tfds: https://www.tensorflow.org/datasets/api_docs/python/tfds/as_numpy
@8huit, @streino, I had the idea to look at branches in @kulizhsy repository.
And in fact, there is an eval branch which is ahead of master !
Can you try this branch, and if it's better, we will merge it.
I opened a PR:
Good find @alexgarel ! The full training script ran fine for me using the "eval" branch.
Hey! Same for me, thanks!
I started a notebook to document and run the model training. I am also trying to explicit the training and testing dataset, and the predicted categories.
Where can i share it (though it's still in progress)?
Where can i share it (though it's still in progress)?
As told on slack: create a branch, add a experiments/ folder with your notebook, then you can commit / push and open a PR (you can set the PR to draft, until you are done)
Perfect 👍 Will do this this weekend!
The goal here is that we must be sure current repo is in a working state.
Also it's a primary step for people who wants to work on the model, so do not hesitate to report your sucess or errors while trying to get it work. Note that as stated on the readme, it might take some hours for a full training.
If someone makes a notebook, this would be a big plus.