yinyajun / Details-In-Recommendation

Recommender-In-Detail is a package which offers detailed implementations of state-of-the-art techniques and basic methods in recommendation.
19 stars 9 forks source link

ESMM Estimator输入多值特征 #1

Open wuminbin opened 5 years ago

wuminbin commented 5 years ago

我看您使用的是csv文件输入,现在我也是这样做的,只不过csv文件中有多值特征(例如,以'|'分隔),请问有什么好的处理方法吗?尝试了tf.string_split,但提示'SparseTensor is not supported'

yinyajun commented 5 years ago

sparseTensor由indices,values和dense_shape三个denseTensor组成。sparseTensor.values是dense的,可以使用。

def parse_csv(f):
    columns = tf.decode_csv(f, record_defaults=_CSV_COLUMN_DEFAULTS, field_delim=',')
    feas = dict(zip(_CSV_COLUMNS, columns))
    for col, size in _MULTIHOT_COLUMNS.items():
        # treat multi-hot columns
        if col in feas:
            # split
            cols = tf.string_split([feas[col]], delimiter='|')  # Shape must be rank 1, return sparse tensor
            feas[col] = cols.values    # densetensor
    return feas