[BUG] using NextItNet in main/examples/00_quick_start/sequential_recsys_amazondataset.ipynb #2110

Open OhitsaSteve opened 3 months ago

OhitsaSteve commented 3 months ago

Hi :)


We want to use the NextItNet Recommender. For that we are using the quickstart example, friendly provided by you. (main/examples/00_quick_start/sequential_recsys_amazondataset.ipynb) Additionally we have modify some lines in the ipynb like by your comments described. Unfortunately it asserts by line "model = SeqModel(hparams, input_creator, seed=RANDOM_SEED)" at part "2. Create model" with

ValueError                                Traceback (most recent call last)
Cell In[34], line 1
----> 1 model = SeqModel(hparams, input_creator, seed=RANDOM_SEED)
      3 ## sometimes we don't want to train a model from scratch
      4 ## then we can load a pre-trained model like this: 
      5 #model.load_model(r'your_model_path')

File /opt/conda/lib/python3.11/site-packages/recommenders/models/deeprec/models/sequential/, in SequentialBaseModel.__init__(self, hparams, iterator_creator, graph, seed)
     46 with self.graph.as_default():
     47     self.sequence_length = tf.compat.v1.placeholder(
     48         tf.int32, [None], name="sequence_length"
     49     )
---> 51 super().__init__(hparams, iterator_creator, graph=self.graph, seed=seed)

File /opt/conda/lib/python3.11/site-packages/recommenders/models/deeprec/models/, in BaseModel.__init__(self, hparams, iterator_creator, graph, seed)
     52 = tf.compat.v1.placeholder(tf.int32, shape=(), name="group")
     54 self.initializer = self._get_initializer()
---> 56 self.logit = self._build_graph()
     57 self.pred = self._get_pred(self.logit, self.hparams.method)
     59 self.loss = self._get_loss()

File /opt/conda/lib/python3.11/site-packages/recommenders/models/deeprec/models/sequential/, in SequentialBaseModel._build_graph(self)
     69 self._build_embedding()
     70 self._lookup_from_embedding()
---> 71 model_output = self._build_seq_graph()
     72 logit = self._fcn_net(model_output, hparams.layer_sizes, scope="logit_fcn")
     73 self._add_norm()

File /opt/conda/lib/python3.11/site-packages/recommenders/models/deeprec/models/sequential/, in SLI_RECModel._build_seq_graph(self)
     68     tf.compat.v1.summary.histogram("LSTM_outputs", rnn_outputs)
     70 with tf.compat.v1.variable_scope("attention_fcn"):
---> 71     att_outputs2 = self._attention_fcn(
     72         self.target_item_embedding, rnn_outputs
     73     )
     74     att_fea2 = tf.reduce_sum(input_tensor=att_outputs2, axis=1)
     75     tf.compat.v1.summary.histogram("att_fea2", att_fea2)

File /opt/conda/lib/python3.11/site-packages/recommenders/models/deeprec/models/sequential/, in SLI_RECModel._attention_fcn(self, query, user_embedding)
    110 query_size = query.shape[1]
    111 boolean_mask = tf.equal(self.mask, tf.ones_like(self.mask))
--> 113 attention_mat = tf.compat.v1.get_variable(
    114     name="attention_mat",
    115     shape=[user_embedding.shape.as_list()[-1], query_size],
    116     initializer=self.initializer,
    117 )
    118 att_inputs = tf.tensordot(user_embedding, attention_mat, [[2], [0]])
    120 queries = tf.reshape(
    121     tf.tile(query, [1, att_inputs.shape[1]]), tf.shape(input=att_inputs)
    122 )

File /opt/conda/lib/python3.11/site-packages/tensorflow/python/ops/, in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
   1614 @tf_export(v1=["get_variable"])
   1615 def get_variable(name,
   1616                  shape=None,
   1628                  synchronization=VariableSynchronization.AUTO,
   1629                  aggregation=VariableAggregation.NONE):
-> 1630   return get_variable_scope().get_variable(
   1631       _get_default_variable_store(),
   1632       name,
   1633       shape=shape,
   1634       dtype=dtype,
   1635       initializer=initializer,
   1636       regularizer=regularizer,
   1637       trainable=trainable,
   1638       collections=collections,
   1639       caching_device=caching_device,
   1640       partitioner=partitioner,
   1641       validate_shape=validate_shape,
   1642       use_resource=use_resource,
   1643       custom_getter=custom_getter,
   1644       constraint=constraint,
   1645       synchronization=synchronization,
   1646       aggregation=aggregation)

File /opt/conda/lib/python3.11/site-packages/tensorflow/python/ops/, in VariableScope.get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
   1338 if dtype is None:
   1339   dtype = self._dtype
-> 1340 return var_store.get_variable(
   1341     full_name,
   1342     shape=shape,
   1343     dtype=dtype,
   1344     initializer=initializer,
   1345     regularizer=regularizer,
   1346     reuse=reuse,
   1347     trainable=trainable,
   1348     collections=collections,
   1349     caching_device=caching_device,
   1350     partitioner=partitioner,
   1351     validate_shape=validate_shape,
   1352     use_resource=use_resource,
   1353     custom_getter=custom_getter,
   1354     constraint=constraint,
   1355     synchronization=synchronization,
   1356     aggregation=aggregation)

File /opt/conda/lib/python3.11/site-packages/tensorflow/python/ops/, in _VariableStore.get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
    583   return custom_getter(**custom_getter_kwargs)
    584 else:
--> 585   return _true_getter(
    586       name,
    587       shape=shape,
    588       dtype=dtype,
    589       initializer=initializer,
    590       regularizer=regularizer,
    591       reuse=reuse,
    592       trainable=trainable,
    593       collections=collections,
    594       caching_device=caching_device,
    595       partitioner=partitioner,
    596       validate_shape=validate_shape,
    597       use_resource=use_resource,
    598       constraint=constraint,
    599       synchronization=synchronization,
    600       aggregation=aggregation)

File /opt/conda/lib/python3.11/site-packages/tensorflow/python/ops/, in _VariableStore.get_variable.<locals>._true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, constraint, synchronization, aggregation)
    532 if "%s/part_0" % name in self._vars:
    533   raise ValueError(
    534       "No partitioner was provided, but a partitioned version of the "
    535       "variable was found: %s/part_0. Perhaps a variable of the same "
    536       "name was already created with partitioning?" % name)
--> 538 return self._get_single_variable(
    539     name=name,
    540     shape=shape,
    541     dtype=dtype,
    542     initializer=initializer,
    543     regularizer=regularizer,
    544     reuse=reuse,
    545     trainable=trainable,
    546     collections=collections,
    547     caching_device=caching_device,
    548     validate_shape=validate_shape,
    549     use_resource=use_resource,
    550     constraint=constraint,
    551     synchronization=synchronization,
    552     aggregation=aggregation)

File /opt/conda/lib/python3.11/site-packages/tensorflow/python/ops/, in _VariableStore._get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
    956       variable_dtype = None
    957     else:
--> 958       raise ValueError("The initializer passed is not valid. It should "
    959                        "be a callable with no arguments and the "
    960                        "shape should not be provided or an instance of "
    961                        "`tf.keras.initializers.*' and `shape` should be "
    962                        "fully defined.")
    964 # Create the variable.
    965 if use_resource is None:
    966   # Set the default value if unspecified.

ValueError: The initializer passed is not valid. It should be a callable with no arguments and the shape should not be provided or an instance of `tf.keras.initializers.*' and `shape` should be fully defined. 

How do we replicate the issue?

take original "main/examples/00_quick_start/sequential_recsys_amazondataset.ipynb" and use NextItNet as through your comments described.

Expected behavior (i.e. solution)

run NextItNet successful with amazondata set. We was apply to run all codeblocks with the other GRU [2], Caser [3], A2SVD [1], SLi_Rec [1], and SUM [5] recommender.

Other Comments

Full code until error :

import sys
import tensorflow.compat.v1 as tf
tf.get_logger().setLevel('ERROR') # only show error messages

from recommenders.utils.timer import Timer
from recommenders.utils.constants import SEED
from recommenders.models.deeprec.deeprec_utils import (
from recommenders.datasets.amazon_reviews import download_and_extract, data_preprocessing
from recommenders.models.deeprec.models.sequential.sli_rec import SLI_RECModel as SeqModel
####  to use the other model, use one of the following lines:
# from recommenders.models.deeprec.models.sequential.asvd import A2SVDModel as SeqModel
# from recommenders.models.deeprec.models.sequential.caser import CaserModel as SeqModel
# from recommenders.models.deeprec.models.sequential.gru import GRUModel as SeqModel
# from recommenders.models.deeprec.models.sequential.sum import SUMModel as SeqModel
from recommenders.models.deeprec.models.sequential.nextitnet import NextItNetModel
#from import SequentialIterator
from import NextItNetIterator
#from recommenders.utils.notebook_utils import store_metadata

print(f"System version: {sys.version}")
print(f"Tensorflow version: {tf.__version__}")

RANDOM_SEED = SEED  # Set None for non-deterministic result

data_path = os.path.join("..", "..", "tests", "resources", "deeprec", "slirec")

##  ATTENTION: change to the corresponding config file, e.g., caser.yaml for CaserModel, sum.yaml for SUMModel
yaml_file = '../../recommenders/models/deeprec/config/sli_rec.yaml'  

ta_path, r'train_data')
valid_file = os.path.join(data_path, r'valid_data')
test_file = os.path.join(data_path, r'test_data')
user_vocab = os.path.join(data_path, r'user_vocab.pkl')
item_vocab = os.path.join(data_path, r'item_vocab.pkl')
cate_vocab = os.path.join(data_path, r'category_vocab.pkl')
output_file = os.path.join(data_path, r'output.txt')

reviews_name = 'reviews_Movies_and_TV_5.json'
meta_name = 'meta_Movies_and_TV.json'
reviews_file = os.path.join(data_path, reviews_name)
meta_file = os.path.join(data_path, meta_name)
train_num_ngs = 4 # number of negative instances with a positive instance for training
valid_num_ngs = 4 # number of negative instances with a positive instance for validation
test_num_ngs = 9 # number of negative instances with a positive instance for testing
sample_rate = 0.01 # sample a small item set for training and testing here for fast example

input_files = [reviews_file, meta_file, train_file, valid_file, test_file, user_vocab, item_vocab, cate_vocab]

if not os.path.exists(train_file):
    download_and_extract(reviews_name, reviews_file)
    download_and_extract(meta_name, meta_file)
    #data_preprocessing(*input_files, sample_rate=sample_rate, valid_num_ngs=valid_num_ngs, test_num_ngs=test_num_ngs)
    #### uncomment this for the NextItNet model, because it does not need to unfold the user history
    data_preprocessing(*input_files, sample_rate=sample_rate, valid_num_ngs=valid_num_ngs, test_num_ngs=test_num_ngs, is_history_expanding=False)

### NOTE:  
### remember to use `_create_vocab(train_file, user_vocab, item_vocab, cate_vocab)` to generate the user_vocab, item_vocab and cate_vocab files, if you are using your own dataset rather than using our demo Amazon dataset.
hparams = prepare_hparams(yaml_file, 
                          learning_rate=0.001,  # set to 0.01 if batch normalization is disable
                          MODEL_DIR=os.path.join(data_path, "model/"),
                          SUMMARIES_DIR=os.path.join(data_path, "summary/"),
                          train_num_ngs=train_num_ngs, # provides the number of negative instances for each positive instance for loss computation.

#input_creator = SequentialIterator
#### uncomment this for the NextItNet model, because it needs a special data iterator for training
input_creator = NextItNetIterator

model = SeqModel(hparams, input_creator, seed=RANDOM_SEED)

## sometimes we don't want to train a model from scratch
## then we can load a pre-trained model like this: 


miguelgfierro commented 3 months ago

This might be related to the issues we've been having with the newest Keras. @SimonYansenZhao do you think this could be related to the issues with TF?

@OhitsaSteve can you check that you have TF < 2.16

OhitsaSteve commented 3 months ago

Thanks for your support :)

Yes, we are using TF < 2.16 . its currently 2.15.0

Btw in the upper code block its still the "sli_req.yaml" , we tried it with the modified "nextitnet" , sorry

OhitsaSteve commented 3 months ago

However, we are using the quickstart example as a startpoint for our project. Is there an other example for "nextitnet" recommender?

OhitsaSteve commented 2 months ago

Hi @miguelgfierro :)

After some time we found the mistake, now it is working ! :)

The problem isnt Tensorflow. The Problem is the setup of the quickstart example. To use the nextitnet recommender it is necessary to un/comment some line of codes. But someone missed to add a line.

At Code block 7 you find this :

model = SeqModel(hparams, input_creator, seed=RANDOM_SEED)

To use NextItNet you have use the model :

 model = NextItNetModel(hparams, input_creator, seed=RANDOM_SEED)

Sounds easy, but without any clue about this setup and with those good guidelines, how to use NextItnet, and this strange Tensorflow error, really hard to find.

**### =>

Therefore , please add the "uncomment line" to code block 7 to use NextItNetModel.**

or you give me the permission to commit and create a pullrequest :)

miguelgfierro commented 2 months ago

@OhitsaSteve yes feel free