Open Xant-5 opened 3 months ago
I have tested the code, and it runs successfully. Please make sure that your Tensorflow version is 2.1.0.
I set up the same environment. I guess it is because of the graphics card. I ran it on a 3090 graphics card on the Windows platform. What version of the graphics card did you use for training?
NVIDIA TITAN Xp on Ubuntu 16.04
When I run this
python run.py --model model --name trial
Modify $PATH to customize ptxas location. This message will be only logged once. step 100 , total_loss: 1.1564, data_loss: 0.9954, CL loss: 1.6075 step 200 , total_loss: 1.1119, data_loss: 0.9512, CL loss: 1.6049 step 300 , total_loss: 1.0300, data_loss: 0.8699, CL loss: 1.5988 step 400 , total_loss: 0.9890, data_loss: 0.8294, CL loss: 1.5932 step 500 , total_loss: 1.0679, data_loss: 0.9091, CL loss: 1.5853 step 600 , total_loss: 1.0344, data_loss: 0.8769, CL loss: 1.5717 step 700 , total_loss: 0.9354, data_loss: 0.7799, CL loss: 1.5510 step 800 , total_loss: 0.9942, data_loss: 0.8371, CL loss: 1.5665 step 900 , total_loss: 0.9921, data_loss: 0.8391, CL loss: 1.5255 step 1000 , total_loss: 0.9307, data_loss: 0.7825, CL loss: 1.4781 eval valid at epoch 1: auc:0.5,logloss:2.5301,wauc:0.5,mean_mrr:0.2,ndcg@2:0.0,ndcg@4:0.0,hit@2:0.0,hit@4:0.0,group_auc:0.5 Traceback (most recent call last): File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1365, in _do_call return fn(*args) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _run_fn target_list, run_metadata) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: sequential/cate_history_embedding_output [[{{node sequential/cate_history_embedding_output}}]] (1) Invalid argument: Nan in summary histogram for: sequential/cate_history_embedding_output [[{{node sequential/cate_history_embedding_output}}]] [[sequential/fatigue_short_1/nn_part/nn_part/w_nn_output1/ReadVariableOp/_64]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "run.py", line 277, in
app.run(main)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\absl\app.py", line 312, in run
_run_main(main, args)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\absl\app.py", line 258, in _run_main
sys.exit(main(argv))
File "run.py", line 266, in main
model = model.fit(train_file, valid_file, valid_num_ngs=flags_obj.valid_num_ngs)
File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py", line 150, in fit
step_result = self.train(train_sess, batch_data_input)
File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\base_model.py", line 411, in train
feed_dict=feed_dict,
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 958, in run
run_metadata_ptr)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1181, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1359, in _do_run
run_metadata)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: sequential/cate_history_embedding_output
[[node sequential/cate_history_embedding_output (defined at D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py:342) ]]
(1) Invalid argument: Nan in summary histogram for: sequential/cate_history_embedding_output
[[node sequential/cate_history_embedding_output (defined at D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py:342) ]]
[[sequential/fatigue_short_1/nn_part/nn_part/w_nn_output1/ReadVariableOp/_64]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation. Input Source operations connected to node sequential/cate_history_embedding_output: sequential/embedding_lookup_4/Identity_1 (defined at D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py:339)
Input Source operations connected to node sequential/cate_history_embedding_output: sequential/embedding_lookup_4/Identity_1 (defined at D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py:339)
Original stack trace for 'sequential/cate_history_embedding_output': File "run.py", line 277, in
app.run(main)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\absl\app.py", line 312, in run
_run_main(main, args)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\absl\app.py", line 258, in _run_main
sys.exit(main(argv))
File "run.py", line 249, in main
model, modelpath, = get_model(flags_obj, exp_name)
File "run.py", line 223, in get_model
model = Model(hparams, input_creator, seed=None)
File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py", line 54, in init
super().init(hparams, iterator_creator, graph=self.graph, seed=seed)
File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\base_model.py", line 59, in init
self.logit, self.fatigue_logit, self.fatigue_logit_fatigue = self._build_graph()
File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py", line 73, in _build_graph
self._lookup_from_embedding()
File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py", line 342, in _lookup_from_embedding
"cate_history_embedding_output", self.cate_history_embedding
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\summary\summary.py", line 179, in histogram
tag=tag, values=values, name=scope)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 289, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 744, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\framework\ops.py", line 3485, in _create_op_internal
op_def=op_def)
File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\framework\ops.py", line 1949, in init
self._traceback = tf_stack.extract_stack()