trying to follow the colab looking at developing a recommender system test for my company as one of my intern ship projects.
When importing movielens 100k data set the issue seems to be stemming from the split parameter for the load function.
Relevant code
import os
import pprint
import tempfile
from typing import Dict, Text
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
# ratings of data
ratings = tfds.load("movielens/100k-ratings", split="train")
# features of all the available movies in dataset
movies = tfds.load("movielens/100k-movies", split="train")
Relevant log output
PS C:\Users\jleroux\Desktop\ML NASH> & C:/Users/jleroux/AppData/Local/Programs/Python/Python312/python.exe "c:/Users/jleroux/Desktop/ML NASH/recommender/retrieval.py"
2024-06-10 22:04:15.411232: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-10 22:04:16.232990: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-10 22:04:19.547211: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-10 22:04:19.673378: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be
discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
{'bucketized_user_age': 45.0,
'movie_genres': array([7], dtype=int64),
'movie_id': b'357',
'movie_title': b"One Flew Over the Cuckoo's Nest (1975)",
'raw_user_age': 46.0,
'timestamp': 879024327,
'user_gender': True,
'user_id': b'138',
'user_occupation_label': 4,
'user_occupation_text': b'doctor',
'user_rating': 4.0,
'user_zip_code': b'53211'}
2024-06-10 22:04:19.675671: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-10 22:04:19.714261: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be
discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
{'movie_genres': array([4], dtype=int64),
'movie_id': b'1681',
'movie_title': b'You So Crazy (1994)'}
2024-06-10 22:04:19.715303: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
What happened?
trying to follow the colab looking at developing a recommender system test for my company as one of my intern ship projects. When importing movielens 100k data set the issue seems to be stemming from the split parameter for the load function.
Relevant code
Relevant log output
tensorflow_hub Version
0.13.0.dev (unstable development build)
TensorFlow Version
2.8 (latest stable release)
Other libraries
No response
Python Version
3.x
OS
Windows