tensorflow / ranking

Learning to Rank in TensorFlow
Apache License 2.0
2.74k stars 477 forks source link

Try to solve CTR problem by tf-ranking #290

Closed pangpang97 closed 3 years ago

pangpang97 commented 3 years ago

hi I am tring to solve CTR problem by tf-ranking. When I ran a small demo, whose data was generated by myself, I got a mistake like this image

Could you help me solve this problem? Thanks in advance. This is my code to generate data https://github.com/pangpang97/tf-ranking-demo/blob/main/genterate_data.py This is my code of training https://github.com/pangpang97/tf-ranking-demo/blob/main/ranking_train.py This is my env: tensorflow 2.6.0 tensorflow-ranking 0.4.2.dev python 3.6.9

rjagerman commented 3 years ago

I think the issue is that your label is an int64 and TF-Ranking is expecting a float in the internal tf.greater_equal call. We should probably change this so it can accept int64 labels, I am not sure how much effort that is, so I will follow-up on that separately.

To solve your issue for now, I think you can change your code to provide float labels instead of int64 labels:

label = tf.cast(label, tf.float32)
pangpang97 commented 3 years ago

I think the issue is that your label is an int64 and TF-Ranking is expecting a float in the internal tf.greater_equal call. We should probably change this so it can accept int64 labels, I am not sure how much effort that is, so I will follow-up on that separately.

To solve your issue for now, I think you can change your code to provide float labels instead of int64 labels:

label = tf.cast(label, tf.float32)

Thanks, it works. But I found another problem: image The loss is always zero.

And I found the dimension of training data is unusual, that is, different features have different dimensions". this is one sample of training data: ({'item_cnt': <tf.Tensor: shape=(1, 1, 1), dtype=int64, numpy=array([[[306]]])>, 'item_type': <tf.Tensor: shape=(1, 1, 1), dtype=int64, numpy=array([[[3]]])>, 'package_price': <tf.Tensor: shape=(1, 1), dtype=int64, numpy=array([[50]])>, 'package_typer': <tf.Tensor: shape=(1, 1), dtype=int64, numpy=array([[0]])>, 'user_age': <tf.Tensor: shape=(1, 1), dtype=int64, numpy=array([[1]])>, 'user_gender': <tf.Tensor: shape=(1, 1), dtype=int64, numpy=array([[1]])>, 'user_pay': <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.81525]], dtype=float32)>, 'example_list_size': }, <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.]], dtype=float32)>)

Does the unuaual dimension lead to my problem( loss is zero) ?

rjagerman commented 3 years ago

Does the unuaual dimension lead to my problem( loss is zero) ?

Yes, you seem to be using softmax loss together with a list of only 1 item, which is by definition zero.

You can try using a pointwise loss such as sigmoid_cross_entropy_loss/mean_squared_loss, or, change your data format so that it generates lists of item, which will allow you to leverage the pairwise/listwise ranking losses.

pangpang97 commented 3 years ago

Does the unuaual dimension lead to my problem( loss is zero) ?

Yes, you seem to be using softmax loss together with a list of only 1 item, which is by definition zero.

You can try using a pointwise loss such as sigmoid_cross_entropy_loss/mean_squared_loss, or, change your data format so that it generates lists of item, which will allow you to leverage the pairwise/listwise ranking losses.

Yes, I used a pointwise loss(acttually, SIGMOID_CROSS_ENTROPY_LOSS) , then I got a non-zero loss. Thanks for your help.