run the models and see if everything works

alexgarel commented 2 years ago

The goal here is that we must be sure current repo is in a working state.

Also it's a primary step for people who wants to work on the model, so do not hesitate to report your sucess or errors while trying to get it work. Note that as stated on the readme, it might take some hours for a full training.

If someone makes a notebook, this would be a big plus.

8huit commented 2 years ago

Hello, I’ve launched the training with the following command (where models is a empty folder i created):

python train.py config.json models

but get the following error:

Epoch 17/50
5465/5465 [==============================] - 1373s 251ms/step - loss: 0.0012 - binary_accuracy: 0.9996 - precision: 0.8874 - recall: 0.7950 - val_loss: 0.0014 - val_binary_accuracy: 0.9996 - val_precision: 0.8894 - val_recall: 0.7894
Training ended
Moving log directory from /var/folders/c7/w4lf4cp91_j_p3dxm9w00rmh0000gn/T/tmpsiomxoa9 to models/logs
Saving the base and the serving model models
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
Evaluating on validation dataset
Traceback (most recent call last):
  File "train.py", line 203, in <module>
    main()
  File "train.py", line 189, in main
    train(
  File "train.py", line 136, in train
    y_pred_val = model.predict(val.map(lambda x,y: x))
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py", line 1751, in predict
    tmp_batch_outputs = self.predict_function(iterator)
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 933, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 759, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3066, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3463, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3298, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1007, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 668, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 994, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py:1586 predict_function  *
        return step_function(self, iterator)
    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py:1576 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:1286 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2849 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:3632 _call_for_each_replica
        return fn(*args, **kwargs)
    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py:1569 run_step  **
        outputs = model.predict_step(data)
    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/training.py:1537 predict_step
        return self(x, training=False)
    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/base_layer.py:1020 __call__
        input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
    /Users/hui-guan/opt/miniconda3/envs/off/lib/python3.8/site-packages/keras/engine/input_spec.py:199 assert_input_compatibility
        raise ValueError('Layer ' + layer_name + ' expects ' +

    ValueError: Layer model expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None) dtype=string>]

Any help will be appreciated, thanks

alexgarel commented 2 years ago

I would use pdb and look at val.map(lambda x,y: x) to see what it produces. (next(iter(val.map(lambda x,y: x))) more precisely.

In create_tf_dataset, we see from output_signature that you should have a tuple of tensor. See also Dataset.map documentation maybe ?

alexgarel commented 2 years ago

BTW you don't have to do retraining to debug as your model is already trained and saved, you should be able to load it in a console and try the predict function on it.

streino commented 2 years ago

Using model.predict(val) should be enough.

The error you get is because predict has some clever unpacking strategy when given an iterable (Dataset in our case). See "Unpacking behavior for iterator-like inputs" at https://keras.io/api/models/model_training_apis/#fit-method. It's explained in the fit doc but also applies to predict.

What's not obvious from the doc is that you can give predict the full Dataset and it'll only use the inputs: https://github.com/tensorflow/tensorflow/blob/v2.6.3/tensorflow/python/keras/engine/training.py#L1539.

streino commented 2 years ago

Once that part is fixed, I run into several other issues with:

report, clf_report = evaluation_report(
    val.map(lambda x, y: y).as_numpy(), y_pred_val, taxonomy=category_taxonomy, category_names=category_names

First, as_numpy() doesn't exist. I managed to fix it using list(val.map(lambda x, y: y).unbatch().as_numpy_iterator()) but I'm not entirely sure it's the best way?

Then category_names isn't defined. I've tweaked the evaluation_report() function to take a category_to_id argument instead and pass it category_vocab. The category_to_id parameter is already supported in the fill_ancestors() function called within evaluation_report(), so you just have to pass it along.

I'll try to submit a PR tomorrow unless someone beats me to it :)

alexgarel commented 2 years ago

First, as_numpy() doesn't exist. I managed to fix it using list(val.map(lambda x, y: y).unbatch().as_numpy_iterator()) but I'm not entirely sure it's the best way?

Maybe np.ndarray(val.map(lambda x, y: y).unbatch().as_numpy_iterator()) simply works ? Or even better from the second exanple in batch doc, can you do something like :

next(val.map(lambda x, y: y).batch(100000000).as_numpy_iterator())

(if possible replace 100000000 by dataset size + 1 )

Otherwise there is a as_numpy function in tfds: https://www.tensorflow.org/datasets/api_docs/python/tfds/as_numpy

alexgarel commented 2 years ago

@8huit, @streino, I had the idea to look at branches in @kulizhsy repository.

And in fact, there is an eval branch which is ahead of master !

Can you try this branch, and if it's better, we will merge it.

I opened a PR:

17

streino commented 2 years ago

Good find @alexgarel ! The full training script ran fine for me using the "eval" branch.

8huit commented 2 years ago

Hey! Same for me, thanks!

I started a notebook to document and run the model training. I am also trying to explicit the training and testing dataset, and the predicted categories.

Where can i share it (though it's still in progress)?

alexgarel commented 2 years ago

Where can i share it (though it's still in progress)?

As told on slack: create a branch, add a experiments/ folder with your notebook, then you can commit / push and open a PR (you can set the PR to draft, until you are done)

8huit commented 2 years ago

Perfect 👍 Will do this this weekend!

openfoodfacts / off-category-classification

run the models and see if everything works #4

17