githubbayes commented 5 years ago

你好，请问下DIN，XDeepFM都会有multivalent 分类变量 embedding，这样的特征模型要怎么输入尼

shenweichen commented 5 years ago

@githubbayes 感谢您的关注！目前版本(v0.2.0)的deepctr只有DIN是支持多值特征输入的，其他模型出于简化使用的目的暂时未支持，会在后续版本中逐步加入。下面是使用DIN模型处理多值特征输入的一个例子：

import numpy as np
from deepctr.models import DIN

def get_xy_fd():

    feature_dim_dict = {"sparse": {'user_age': 4, 'user_gender': 2,

                                   'item_id': 4, 'item_gender': 2}, "dense": []}#原始特征

    behavior_feature_list = ["item_id","item_gender"]#历史行为特征
    #单值特征
    user_age = np.array([1, 2, 3])
    user_gender = np.array([0, 1, 0])
    item_id = np.array([0, 1, 2])
    item_gender = np.array([0, 1, 0])

    #多值特征
    hist_item_id = np.array([[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 0]])
    hist_item_gender = np.array([[0, 1, 0, 1], [0, 1, 1, 1], [0, 0, 1, 0]])
    hist_length = np.array([4, 4, 3])#每个样本的历史序列长度

    feature_dict = {'user_age': user_age, 'user_gender': user_gender, 'item_id': item_id, 'item_gender': item_gender,

                    'hist_item_id': hist_item_id, 'hist_item_gender': hist_item_gender, }

    x = [feature_dict[feat] for feat in feature_dim_dict["sparse"]] + [feature_dict['hist_'+feat] for feat in behavior_feature_list] +[hist_length]
    #这里注意拼接顺序：依次为单值特征，多值特征，多值特征长度
    #由于DIN中不同特征的历史序列长度都是一致的，因为都是从item_id扩展出来的，所以只需一个向量就够了
    y = [1, 0, 1]

    return x, y, feature_dim_dict, behavior_feature_list

x, y, feature_dim_dict, behavior_feature_list = get_xy_fd()
model = DIN(feature_dim_dict, behavior_feature_list, hist_len_max=4,)
model.compile('adam', 'binary_crossentropy',

              metrics=['binary_crossentropy'])
history = model.fit(x, y, verbose=1, validation_split=0.5)

具体DIN模型的参数可以关注下说明文档中的相关内容~

githubbayes commented 5 years ago

@githubbayes 感谢您的关注！目前版本(v0.2.0)的deepctr只有DIN是支持多值特征输入的，其他模型出于简化使用的目的暂时未支持，会在后续版本中逐步加入。下面是使用DIN模型处理多值特征输入的一个例子：

import numpy as np
from deepctr.models import DIN

def get_xy_fd():

    feature_dim_dict = {"sparse": {'user_age': 4, 'user_gender': 2,

                                   'item_id': 4, 'item_gender': 2}, "dense": []}#原始特征

    behavior_feature_list = ["item_id","item_gender"]#历史行为特征
    #单值特征
    user_age = np.array([1, 2, 3])
    user_gender = np.array([0, 1, 0])
    item_id = np.array([0, 1, 2])
    item_gender = np.array([0, 1, 0])

    #多值特征
    hist_item_id = np.array([[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 0]])
    hist_item_gender = np.array([[0, 1, 0, 1], [0, 1, 1, 1], [0, 0, 1, 0]])
    hist_length = np.array([4, 4, 3])#每个样本的历史序列长度

    feature_dict = {'user_age': user_age, 'user_gender': user_gender, 'item_id': item_id, 'item_gender': item_gender,

                    'hist_item_id': hist_item_id, 'hist_item_gender': hist_item_gender, }

    x = [feature_dict[feat] for feat in feature_dim_dict["sparse"]] + [feature_dict['hist_'+feat] for feat in behavior_feature_list] +[hist_length]
    #这里注意拼接顺序：依次为单值特征，多值特征，多值特征长度
    #由于DIN中不同特征的历史序列长度都是一致的，因为都是从item_id扩展出来的，所以只需一个向量就够了
    y = [1, 0, 1]

    return x, y, feature_dim_dict, behavior_feature_list

x, y, feature_dim_dict, behavior_feature_list = get_xy_fd()
model = DIN(feature_dim_dict, behavior_feature_list, hist_len_max=4,)
model.compile('adam', 'binary_crossentropy',

              metrics=['binary_crossentropy'])
history = model.fit(x, y, verbose=1, validation_split=0.5)

具体DIN模型的参数可以关注下说明文档中的相关内容~

Thanks shenweichen

shenweichen commented 5 years ago

@githubbayes 感谢您的关注！最新版本已经加入了对multivalent 分类变量输入的支持，

对于AFM,AutoInt,DCN,DeepFM,FNN,NFM,PNN,xDeepFM请参考 https://deepctr-doc.readthedocs.io/en/latest/Examples.html#multi-value-input-movielens
对于DIN 请参考 https://github.com/shenweichen/DeepCTR/blob/master/examples/run_din.py

thulorry commented 5 years ago

@shenweichen 你好请问一下，是不是sequence_feature中只能输入一个VarLenFeat对象？我在实际应用的时候碰到一个问题一个字段是用户最喜欢的广告id（ad1|ad2|ad3），一个字段是用户最喜欢的产品id（product1|product2|product3）这种，两个embeding肯定是要分开的， sequence_features = {'activity_features':activity_dict,'product_features':product_dict} sequence_feat_list = [VarLenFeat(feat, len(value)+1,len(load_data[feat][0]),'mean') for feat,value in sequence_features.items() ] 但是我用这种方式输入以后，加上dense_input，sparse_input 就超出了DeepFM的model限制

thulorry commented 5 years ago

问题解决了，是在model的地方没有写dense_list的缘故，加上以后就可以了

4.Define Model,compile and train

model = DeepFM({"sparse": sparse_feat_list, "dense":dense_feat_list, "sequence": sequence_feat_list}, final_activation='linear')

zksar commented 5 years ago

您好，请问我有多值特征然后想用NFFM 但目前NFFM还没有multivalent input 我该怎么操作呢有什么好的建议吗非常感谢！感觉NFFM还是挺强大的

shenweichen commented 5 years ago

@zksar 请参考这个样例 https://deepctr-doc.readthedocs.io/en/latest/Examples.html#multi-value-input-movielens

shenweichen / DeepCTR

multivalent input 问题 #20

4.Define Model,compile and train