recommenders-team / recommenders

Best Practices on Recommendation Systems
https://recommenders-team.github.io/recommenders/intro.html
MIT License
18.89k stars 3.07k forks source link

[ASK] Help setting FEATURE_COUNT and FIELD_COUNT values in xDeepFM hyperparameter YAML #862

Closed ghost closed 5 years ago

ghost commented 5 years ago

Description

I am trying to learn how to use the xDeepFM model. I've taken my data and converted it into FFM format, currently it looks like this.

rating userID itemID product_series times_ordered quantity_ordered 3 1:1:85532 2:2:62959 3:9:1 4:56:1 5:57:12.0

Requesting help in setting the hyperparameters for xDeepFM based on the input dataset. How do I set the appropriate value for FEATURE_COUNT and FIELD_COUNT based on the dataset? Are there any other hyperparameters that are affected by the shape and content of the FFM?

I looked at https://github.com/microsoft/recommenders/blob/ye_dev/notebooks/00_quick_start/xdeepfm_movielens.ipynb for hints/tips on how to set everything up. In this example, FIELD_COUNT = 3 and FEATURE_COUNT = 22 primarily due to genre containing multiple feature values.

The shape of the example has 22 columns and FEATURE_COUNT = 22 and the standard Rating | User | Item which I presume where FIELD_COUNT = 3 comes from.

So it seems to me that I should use FIELD_COUNT = 3 and FEATURE_COUNT = 6

However, when I run I get an error message like this: InvalidArgumentError: indices[2] = 8 is not in [0, 6) [[node XDeepFM/embedding/embedding_lookup_sparse/embedding_lookup (defined at ../../../recommenders\reco_utils\recommender\deeprec\models\xDeepFM.py:79) ]]

This information will help many of us as we experiment with user/item features and need to reconfigure the hyperparameters that affect the underlying NNs.

Other Comments

ghost commented 5 years ago

Trying the simplest example with an FFM structure like rating | customer_id | item_id where I set FEATURE_COUNT = FIELD_COUNT = 3 and I get this error message: InvalidArgumentError: Input to reshape is a tensor with 3830 values, but the requested shape requires a multiple of 30 [[node XDeepFM/Reshape (defined at ../../../recommenders\reco_utils\recommender\deeprec\models\xDeepFM.py:164) ]]

ghost commented 5 years ago

In looking at the libFFMConverter it appears that feature_count and field_count are attributes of the class that one can examine after transform. Since the examples to date all have pre-created datasets (synthetic and criteo_tiny) it is not clear how to set the hyperparameters. Perhaps this can be added as an example notebook for data prep and data transform.

ghost commented 5 years ago

converter = LibffmConverter().fit(data, col_rating='rating') df_out = converter.transform(data) df_out.head(5)

print(converter.field_count) print(converter.feature_count)

Using the attributes field_count and feature_count, I was able to successfully run the model by using these in the YAML file.

gramhagen commented 5 years ago

fwiw @yexing99 is working on a pr (#861) that might have a good example for what you're digging into

ghost commented 5 years ago

Thank you @gramhagen