tsinghua-fib-lab / SIGIR24-FRec

4 stars 0 forks source link

Problem #2

Open Xant-5 opened 3 months ago

Xant-5 commented 3 months ago

When I run this python run.py --model model --name trial

Modify $PATH to customize ptxas location. This message will be only logged once. step 100 , total_loss: 1.1564, data_loss: 0.9954, CL loss: 1.6075 step 200 , total_loss: 1.1119, data_loss: 0.9512, CL loss: 1.6049 step 300 , total_loss: 1.0300, data_loss: 0.8699, CL loss: 1.5988 step 400 , total_loss: 0.9890, data_loss: 0.8294, CL loss: 1.5932 step 500 , total_loss: 1.0679, data_loss: 0.9091, CL loss: 1.5853 step 600 , total_loss: 1.0344, data_loss: 0.8769, CL loss: 1.5717 step 700 , total_loss: 0.9354, data_loss: 0.7799, CL loss: 1.5510 step 800 , total_loss: 0.9942, data_loss: 0.8371, CL loss: 1.5665 step 900 , total_loss: 0.9921, data_loss: 0.8391, CL loss: 1.5255 step 1000 , total_loss: 0.9307, data_loss: 0.7825, CL loss: 1.4781 eval valid at epoch 1: auc:0.5,logloss:2.5301,wauc:0.5,mean_mrr:0.2,ndcg@2:0.0,ndcg@4:0.0,hit@2:0.0,hit@4:0.0,group_auc:0.5 Traceback (most recent call last): File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1365, in _do_call return fn(*args) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _run_fn target_list, run_metadata) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: sequential/cate_history_embedding_output [[{{node sequential/cate_history_embedding_output}}]] (1) Invalid argument: Nan in summary histogram for: sequential/cate_history_embedding_output [[{{node sequential/cate_history_embedding_output}}]] [[sequential/fatigue_short_1/nn_part/nn_part/w_nn_output1/ReadVariableOp/_64]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run.py", line 277, in app.run(main) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\absl\app.py", line 258, in _run_main sys.exit(main(argv)) File "run.py", line 266, in main model = model.fit(train_file, valid_file, valid_num_ngs=flags_obj.valid_num_ngs) File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py", line 150, in fit step_result = self.train(train_sess, batch_data_input) File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\base_model.py", line 411, in train feed_dict=feed_dict, File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 958, in run run_metadata_ptr) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1181, in _run feed_dict_tensor, options, run_metadata) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1359, in _do_run run_metadata) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\client\session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Nan in summary histogram for: sequential/cate_history_embedding_output [[node sequential/cate_history_embedding_output (defined at D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py:342) ]] (1) Invalid argument: Nan in summary histogram for: sequential/cate_history_embedding_output [[node sequential/cate_history_embedding_output (defined at D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py:342) ]] [[sequential/fatigue_short_1/nn_part/nn_part/w_nn_output1/ReadVariableOp/_64]] 0 successful operations. 0 derived errors ignored.

Errors may have originated from an input operation. Input Source operations connected to node sequential/cate_history_embedding_output: sequential/embedding_lookup_4/Identity_1 (defined at D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py:339)

Input Source operations connected to node sequential/cate_history_embedding_output: sequential/embedding_lookup_4/Identity_1 (defined at D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py:339)

Original stack trace for 'sequential/cate_history_embedding_output': File "run.py", line 277, in app.run(main) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\absl\app.py", line 258, in _run_main sys.exit(main(argv)) File "run.py", line 249, in main model, modelpath, = get_model(flags_obj, exp_name) File "run.py", line 223, in get_model model = Model(hparams, input_creator, seed=None) File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py", line 54, in init super().init(hparams, iterator_creator, graph=self.graph, seed=seed) File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\base_model.py", line 59, in init self.logit, self.fatigue_logit, self.fatigue_logit_fatigue = self._build_graph() File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py", line 73, in _build_graph self._lookup_from_embedding() File "D:\hyl\SIGIR24-FRec-main\recommenders\models\deeprec\models\sequential\sequential_base_model.py", line 342, in _lookup_from_embedding "cate_history_embedding_output", self.cate_history_embedding File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\summary\summary.py", line 179, in histogram tag=tag, values=values, name=scope) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 289, in histogram_summary "HistogramSummary", tag=tag, values=values, name=name) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 744, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\framework\ops.py", line 3485, in _create_op_internal op_def=op_def) File "D:\Anaconda\envs\SIGIR24-FRec-main4\lib\site-packages\tensorflow\python\framework\ops.py", line 1949, in init self._traceback = tf_stack.extract_stack()

wjjln commented 3 months ago
image

I have tested the code, and it runs successfully. Please make sure that your Tensorflow version is 2.1.0.

Xant-5 commented 3 months ago

I set up the same environment. I guess it is because of the graphics card. I ran it on a 3090 graphics card on the Windows platform. What version of the graphics card did you use for training?

wjjln commented 3 months ago

NVIDIA TITAN Xp on Ubuntu 16.04