Using CSV data as input to recommender

tansaku commented 3 years ago

I'm trying to replicate the quick start recommender with some csv data, and using the pandas read_csv operation.

Reading the csv data works and I can inspect it, e.g.

my_file_train = pd.read_csv("my_file.csv",header=0)

and I can view the .head() the data appears as expected. The type of the my_file_train is

<class 'pandas.core.frame.DataFrame'>

Following the approach taken in https://stackoverflow.com/questions/58362316/how-do-i-go-from-pandas-dataframe-to-tensorflow-batchdataset-for-nlp I can get a DataSet from the Panda DataFrame

training_dataset = (
    tf.data.Dataset.from_tensor_slices(
        (
            tf.cast(my_file_train['feature1'].values, tf.string),
            tf.cast(my_file_train['user_id'].values, tf.int64)
        )
    )
)

The type of training_datasetis:

<class 'tensorflow.python.data.ops.dataset_ops.TensorSliceDataset'>

but so then I try to build vocabularies as in the example, where we see code like this:

user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))

and I had thought that I could do something similar like this:

user_ids_vocabulary = tf.keras.layers.experimental.preprocessing.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(training_dataset.map(lambda x: x[1]))

since the DataSet I have is tuples rather than a dictionary, but I get the following error:

TypeError: () takes 1 positional argument but 2 were given

which probably just exposes that I'm taking completely the wrong approach somewhere, but I'd be very grateful if anyone could set me on track.

Would it be simpler to create my own tfds dataset a la https://www.tensorflow.org/datasets/add_dataset rather than converting it on the fly? or is there some thing simple that I'm missing in terms of the manipulation that I'm trying to do?

maciejkula commented 3 years ago

Can you try use_ids_vocabulary.adapt(training_dataset.map(lambda x, y: y)?

tansaku commented 3 years ago

many thanks for replying, that's fixed it and I've got things working (I think).

Really appreciate your help - my question now becomes if there's a simple way to adjust the basic model to recommend other users to users based on overlapping commonalities ... maybe I should open that as a separate question ...

tensorflow / recommenders

Using CSV data as input to recommender #315