tensorlab / tensorfx

TensorFlow framework for training and serving machine learning models
Apache License 2.0
196 stars 41 forks source link

cannot name output tensors [feature request] #15

Closed brandondutra closed 7 years ago

brandondutra commented 7 years ago

goal: have the following output tensor names for classification: predicted: predicted class score: prob of predicted class predicted_2: 2nd most likely class score_2: prob of 2nd class.

I thought I could just change the keys in the output dict of build_output in _ff.py. 1) It is actually required the dict has a 'label' key for eval metrics. This makes sense, but we should use a better mechanism or document in training/_model.py:build_output(). 2) I don't think that dict's score key is consumed. For fun I changed it to 'score_xx' and got no errors.

Notes to self:

So where do friendly output tensor names come from? tl;dr: there are no friendly names Fun story

  1. Tensors are added to the 'output' collection (tf.add_to_collection('outputs', labels)), in _ff.py:build_output
  2. build_output() is called by _model.py:build_prediction_graph() which gets graph_inputs/graph_outputs by calling tf.get_collection('outputs'). In this function, it does have access to the dict that _ff.py:build_output make, so this would be where we could put the names back in. However, how the output of _model.py:build_prediction_graph() is consumed needs updating if outputs changes from list to dict.
  3. _model.py:build_prediction_graph() called by _job.py:start() and exposed via _job.py:prediction.
  4. _job.py:prediction only used in the hooks.
  5. hooks call tfx.prediction.Model.save() with the input and output lists from tf.get_collection()
  6. prediction/_mode.py:save() builds signature_map with _build_signature().
  7. Finally, in prediction/_mode.py:_build_signature() we find the source of the tensor names. They are almost just a copy of the true tensor names. This function can easily change from list to dict inputs.

This works well when only outputting two simple tensors (score and label) but does not work for the following

  1. datalab needs things called predicted, and score for model analysis. (I know, yuck, tensorfx should be independent of datalab)
  2. top_k output will have crazy names. Here is a look at what the current system does if I add top_k tensors to the tf 'output' collection
    
    input_alias_map {u'instances': u'input/instances:0'}

output_alias_map {u'scores_1': u'output/scores_1:0', u'label_1': u'output/label_1:0', u'TopKV2': u'output/TopKV2:1', u'label': u'output/label:0'}

nikhilk commented 7 years ago

Output names are local tensor names (after removing the index and the scope) ... so the friendly output name of x/y/foo:0 is foo. This avoids requirement of alias names as a new concept that intrinsically TensorFlow doesn't have. So you should be able to produce the same dictionary by naming tensors what you might have used as alias names in the older samples.

FeedForwardClassification does need more docs - on the signature of the model it produces -- the inputs and outputs.

It would make sense to call the outputs label, score, label_N, score_N for classification, and we should be able to make change in Datalab.

brandondutra commented 7 years ago

ah, I understand your friendly name trick.