Closed AlirezaSadeghi closed 2 years ago
@AlirezaSadeghi can you specify your label type?
@Cheril311 If I'm understanding you correctly, I've already done it in the text, it's the 2nd entry in the PrefetchDataset
tuple (namely TensorSpec(shape=(None, 1), dtype=tf.int64, name=None)
).
It's an integer
with values either 0
or 1
, but as we're reading it in batches, it's of type (None, 1).
So the dataset
that's being passed to model.fit
is a tuple of (Dict {feature name -> Tensor(None, 1)
}, Label Tensor(None, 1))
Did I answer your question? If not please elaborate if possible.
@AlirezaSadeghi my bad
Hi AlirezaSadeghi,
If the loss argument of the Gradient boosted tree is not specified, it is selected automatically from the label type, label values and task. The error you reported indicates that there is no loss matching your label.
Looking at your example, a likely situation is that your int64 label only contains zeros. Can you check it?
Alternatively, you can specify the loss to be the "BINOMIAL_LOG_LIKELIHOOD" i.e. binary classification loss.
On my side, I'll improve the error message for this particular situation.
Hi AlirezaSadeghi,
If the loss argument of the Gradient boosted tree is not specified, it is selected automatically from the label type, label values and task. The error you reported indicates that there is no loss matching your label.
Looking at your example, a likely situation is that your int64 label only contains zeros. Can you check it?
Alternatively, you can specify the loss to be the "BINOMIAL_LOG_LIKELIHOOD" i.e. binary classification loss.
On my side, I'll improve the error message for this particular situation.
Hi @achoum ,
Yup your assumption is actually right, I'm just testing the pipeline and running the model on a part of the training set, which includes all zeros for starters. Didn't know that might become an issue.
I'll try with BINOMIAL_LOG_LIKELIHOOD
and get back to you.
Okay doing that, it tells me this:
INVALID_ARGUMENT: Binomial log likelihood loss is only compatible with a BINARY classification task
It's somehow assuming the task is not "binary classification"?
@achoum just an fyi, have you seen my last comment? Wondering if you've got any further insights.
If your task is not a binary classification task, you can try setting the loss to MULTINOMIAL_LOG_LIKELIHOOD
My task "is" binary classification, and the labels are all 0s, don't know how it's assuming the task is not "binary classification". (as I've already mentioned before)
Oh, apologies, I overlooked that part in your first message
@achoum No new updates/insights on this? 😔
If all your labels are all 0, the framework detects that this is not a binary classification and fails. If you want to test binary classification, can you create a synthetic dataset with both 0 and 1?
While for unit testing, training on dataset where all the labels have the value could make sense, this error/failure helps to catch error in datasets.
I'm trying to use
GradientBoostedTreesModel
in a TFX pipeline, the code is roughly as follows:This unfortunately gives me an
INVALID_ARGUMENT: No defined default loss for this combination of label type and task
exception and fails the model training.Definition of
_input_fn
is as follows:Which basically parses the schema into feature specs, parses the batch of TF-examples and finally maps them to a tuple of (Dict[feature_name, Tensor], Tensor), results is like this:
Labels can be 0 or 1 and the task is a binary classification task.
Any idea what I might be doing wrong here?
Mac OS Monterey, tfdv 0.2.4, python 3.8, tfx 1.7