Closed johanneskruse closed 2 years ago
Hi, Johannes, If you don't have a sparse feature, why do you use NFM?
Hi,
I was experimenting with the variable (sparse/dense) impact on performance. The issue is part of the basemodel, thus, all models that rely on this e.g. NFM, DeepFM, DCN, etc.
To my understanding the models mentioned do not require feature engineering, i.e. they share the same input "linear_feature_columns" and "dnn_feature_columns". Thus, shouldn't the models be able to run without the sparse features, just like it can without dense features?
Futhermore, I tried to insert:
if embedding_size_set == set():
return 0
else:
return list(embedding_size_set)[0]
To avoid the issue.
This then creates the following issue:
linear_logit = torch.zeros([X.shape[0], 1]).to(sparse_embedding_list[0].device)
IndexError: list index out of range
However, looking at the code in basemodel
linear_logit = torch.zeros([X.shape[0], 1]).to(sparse_embedding_list[0].device)
if len(sparse_embedding_list) > 0:
sparse_embedding_cat = torch.cat(sparse_embedding_list, dim=-1)
if sparse_feat_refine_weight is not None:
# w_{x,i}=m_{x,i} * w_i (in IFM and DIFM)
sparse_embedding_cat = sparse_embedding_cat * sparse_feat_refine_weight.unsqueeze(1)
sparse_feat_logit = torch.sum(sparse_embedding_cat, dim=-1, keepdim=False)
linear_logit += sparse_feat_logit
if len(dense_value_list) > 0:
dense_value_logit = torch.cat(
dense_value_list, dim=-1).matmul(self.weight)
linear_logit += dense_value_logit
It seems that it is intended to handle no sparse features:
if len(sparse_embedding_list) > 0
as this is allowed to be False.
I hope may questions can be followed.
Thus, shouldn't the models be able to run without the sparse features, just like it can without dense features?
Take DeepFM and NFM for example, if you don't have a sparse feature, how do you derive the embedding vectors which are used in FM part?
Issue when there are no sparse features in the input, e.g.
====================================================================
WITHOUT SPARSE FEATURES:
====================================================================
I have looked into modifying the "return list(embedding_size_set)[0]" in line 511 when there are no sparse features, however, if sparse_features = [], the output in print(embedding_size) = set().
I am not sure what consequences and how it affects the model later on forcing it to continue. The model does work if without dense features:
====================================================================
====================================================================
Alternatively you can generate a dummy column (ones) for the model, however, this is a hack and I don't think a good solution. Is it possible to make a solution for the "must-have-sparse-feature-problem"?
Thank you for you time and awesome work!!
Best regards, Johannes