Pursuerwener commented 2 months ago

Traceback (most recent call last):
File "main.py", line 132, in main() File "main.py", line 90, in main logging.info('Test Before Training: ' + runner.print_res(model, data_dict['test'])) File "E:\LRD-main\src\helpers\BaseRunner.py", line 243, in print_res result_dict = self.evaluate(model, data, self.topk, self.metrics) File "E:\LRD-main\src\helpers\BaseRunner.py", line 206, in evaluate predictions = self.predict(model, data) File "E:\LRD-main\src\helpers\BaseRunner.py", line 225, in predict out_dict = model(utils.batch_to_gpu(batch, model.device)) File "D:\Software\Anaconda\envs\dccf\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "E:\LRD-main\src\models\sequential\KDAPlus.py", line 148, in forward prediction = self.rec_forward(feed_dict) File "E:\LRD-main\src\models\sequential\KDAPlus.py", line 251, in rec_forward context, target_attention = self.relational_dynamic_aggregation( File "D:\Software\Anaconda\envs\dccf\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(input, **kwargs) File "E:\LRD-main\src\models\sequential\KDAPlus.py", line 466, in forward target_attention = torch.where(valid_mask.squeeze(1).repeat(1,1,self.n_relation), target_attention, 0.) RuntimeError: expected scalar type float but found double

ysh-1998 commented 2 months ago

Can you provide your virtual environment configuration?

Pursuerwener commented 2 months ago

ca-certificates 2023.05.30
colorama 0.4.6
colorlog 6.8.2
libffi 3.4.4
networkx 3.1
numpy 1.22.3
openssl 3.0.10
pandas 2.0.3
pip 23.2.1
python 3.8.17
python-dateutil 2.8.2
pytz 2023.3
pyyaml 6.0.1
reckit 0.2.4
scipy 1.10.1
setproctitle 1.3.3
setuptools 68.0.0
six 1.16.0
sqlite 3.41.2
torch 1.11.0+cu113
torch-scatter 2.0.9
torch-sparse 0.6.14
tqdm 4.65.0
typing-extensions 4.7.1
tzdata 2023.3
vc 14.2
vs2015_runtime 14.27.29016
wheel 0.38.4

ysh-1998 commented 2 months ago

Please follow the Getting Started part in the README to create a new virtual environment.

Pursuerwener commented 2 months ago

Hello, thank you very much for your response. I reconfigured the environment according to the requirements, but I encountered an issue with the pandas package. pandas 1.3.5 torch 1.13.1+cu117 tqdm 4.66.4 numpy 1.21.3 ipython 8.18.1 jupyter 1.0.0 Load corpus from ../data/Office\DFTReader_8.pkl Traceback (most recent call last): File "E:\LRD-main\src\main.py", line 132, in <module> main() File "E:\LRD-main\src\main.py", line 61, in main corpus = pickle.load(open(corpus_path, 'rb')) AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from 'D:\\Software\\Anaconda\\envs\\LRD\\lib\\site-packages\\pandas\\_libs\\internals.cp39-win_amd64.pyd'>

I tried upgrading the pandas version, but during training, the loss value appeared as NaN.

Test Before Training: (HR@10:1.0000,HR@5:1.0000,HR@50:1.0000,NDCG@10:1.0000,NDCG@5:1.0000,NDCG@50:1.0000) Optimizer: Adam all_losses: [8.602051 0.7128776 0.69314736]
Epoch 1 loss=8.6020 [18.8 s] dev=(HR@5:0.1056,NDCG@5:0.0627) test=(HR@5:0.0693,NDCG@5:0.0399) [26.2 s] Save model to ../model/KDAPlus/KDAPlus_Office_2019_lr=0.001_l2=1... all_losses: [nan nan nan]
Epoch 2 loss=nan [17.7 s] dev=(HR@5:1.0000,NDCG@5:1.0000) test=(HR@5:1.0000,NDCG@5:1.0000) [24.1 s] Save model to ../model/KDAPlus/KDAPlus_Office_2019_lr=0.001_l2=1... all_losses: [nan nan nan]
Epoch 3 loss=nan [17.1 s] dev=(HR@5:1.0000,NDCG@5:1.0000) test=(HR@5:1.0000,NDCG@5:1.0000) [24.0 s] Save model to ../model/KDAPlus/KDAPlus_Office_2019_lr=0.001_l2=1...

ysh-1998 commented 2 months ago

For the first issue, maybe the reason is that the file DFTReader_8.pkl is saved and loaded under different pandas versions. So please keep the pandas version as 1.3.5. Then try to remove and regenerate the pkl file.

Pursuerwener commented 2 months ago

Thank you for your help during this time. I successfully ran the code, and I greatly appreciate your code sharing and assistance.

Pursuerwener commented 1 month ago

A new problem occurred. All models contained in the code worked, but only KDAplus had a loss of nan again. I re-unzipped a new code to run KDAplus and still had this problem

result： Namespace(model_name='KDAPlus') --------------------------------------------- BEGIN: 2024-07-26 18:09:00 ---------------------------------------------

===================================== Arguments | Values

alpha | 0.1 attention_size | 10 batch_size | 256 buffer | 1 dataset | Office dropout | 0 early_stop | 5 emb_size | 64 epoch | 200 eval_batch_size | 256 freq_rand | 1 gamma | -1.0 gpu | 0 history_max | 20 include_attr | 1 include_kge | 1 include_lrd | 1 include_val | 1 l2 | 1e-06 lamda | 10 latent_relation_num | 8 leave_one_latent | 1 load | 0 lr | 0.001 message | metric | ["NDCG","HR"] n_dft | 64 neg_head_p | 0.5 num_heads | 4 num_layers | 5 num_neg | 1 num_workers | 5 only_predict | 0 optimizer | Adam plm_name | feat.GPT plm_size | 1536 pooling | average random_seed | 2019 t_scalar | 60 topk | [5,10,50]

cuda available: True

cuda devices: 1

Load corpus from ../data/Office\DFTReader_8.pkl

params: 4336925

KDAPlus( (user_embeddings): Embedding(4906, 64) (item_embeddings): Embedding(2421, 1536, padding_idx=0) (project_layer_1): Linear(in_features=1536, out_features=64, bias=True) (project_layer_2): Linear(in_features=128, out_features=13, bias=True) (entity_embeddings): Embedding(2803, 64) (relation_embeddings): Embedding(13, 64) (relational_dynamic_aggregation): RelationalDynamicAggregation( (relation_embeddings): Embedding(13, 64) (freq_real): Embedding(13, 33) (freq_imag): Embedding(13, 33) ) (attn_head): MultiHeadAttention( (q_linear): Linear(in_features=64, out_features=64, bias=False) (k_linear): Linear(in_features=64, out_features=64, bias=False) (v_linear): Linear(in_features=64, out_features=64, bias=False) ) (W1): Linear(in_features=64, out_features=64, bias=True) (W2): Linear(in_features=64, out_features=64, bias=True) (dropout_layer): Dropout(p=0, inplace=False) (layer_norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (item_bias): Embedding(2422, 1) ) Test Before Training: (HR@10:1.0000,HR@5:1.0000,HR@50:1.0000,NDCG@10:1.0000,NDCG@5:1.0000,NDCG@50:1.0000) Optimizer: Adam all_losses: [8.602051 0.7128776 0.69314736]
Epoch 1 loss=8.6020 [16.7 s] dev=(HR@5:0.1056,NDCG@5:0.0627) test=(HR@5:0.0693,NDCG@5:0.0399) [24.8 s] Save model to ../model/KDAPlus/KDAPlus_Office_2019_lr=0.001_l2=1... all_losses: [nan nan nan]
Epoch 2 loss=nan [16.6 s] dev=(HR@5:1.0000,NDCG@5:1.0000) test=(HR@5:1.0000,NDCG@5:1.0000) [23.7 s] Save model to ../model/KDAPlus/KDAPlus_Office_2019_lr=0.001_l2=1... all_losses: [nan nan nan]

ysh-1998 commented 1 month ago

I try to create a new virtual environment and run the following command:

python main.py --model_name KDAPlus --emb_size 64 --include_attr 1 --include_val 1 --freq_rand 1 --lr 1e-3 --l2 1e-6 --num_heads 4 --num_layers 5 --gamma -1 --history_max 20 --dataset Office --include_lrd 1 --epoch 200 --gpu 0

The running result looks fine, you can check the difference between the output below and yours.

Namespace(model_name='KDAPlus')
--------------------------------------------- BEGIN: 2024-07-26 19:05:06 ---------------------------------------------

=====================================
 Arguments           | Values        
=====================================
 alpha               | 0.1          
 attention_size      | 10           
 batch_size          | 256          
 buffer              | 1            
 dataset             | Office       
 dropout             | 0            
 early_stop          | 5            
 emb_size            | 64           
 epoch               | 200          
 eval_batch_size     | 256          
 freq_rand           | 1            
 gamma               | -1.0         
 gpu                 | 0            
 history_max         | 20           
 include_attr        | 1            
 include_kge         | 1            
 include_lrd         | 1            
 include_val         | 1            
 l2                  | 1e-06        
 lamda               | 10           
 latent_relation_num | 8            
 leave_one_latent    | 1            
 load                | 0            
 lr                  | 0.001        
 message             |              
 metric              | ["NDCG","HR"]
 n_dft               | 64           
 neg_head_p          | 0.5          
 num_heads           | 4            
 num_layers          | 5            
 num_neg             | 1            
 num_workers         | 5            
 only_predict        | 0            
 optimizer           | Adam         
 plm_name            | feat.GPT     
 plm_size            | 1536         
 pooling             | average      
 random_seed         | 2019         
 t_scalar            | 60           
 topk                | [5,10,50]    
=====================================
cuda available: True
# cuda devices: 1
Reading data from "../data/", dataset = "Office" 
Counting dataset statistics...
"# user": 4906, "# item": 2421, "# entry": 53258
Appending history info...
Constructing relation triplets...
Item-item relations:['r_complement', 'r_substitute']
Attribute-based relations:['i_category', 'i_brand']
"# relation": 13, "# triplet": 60706
Save corpus to ../data/Office/DFTReader_8.pkl
#params: 4336925
KDAPlus(
  (user_embeddings): Embedding(4906, 64)
  (item_embeddings): Embedding(2421, 1536, padding_idx=0)
  (project_layer_1): Linear(in_features=1536, out_features=64, bias=True)
  (project_layer_2): Linear(in_features=128, out_features=13, bias=True)
  (entity_embeddings): Embedding(2803, 64)
  (relation_embeddings): Embedding(13, 64)
  (relational_dynamic_aggregation): RelationalDynamicAggregation(
    (relation_embeddings): Embedding(13, 64)
    (freq_real): Embedding(13, 33)
    (freq_imag): Embedding(13, 33)
  )
  (attn_head): MultiHeadAttention(
    (q_linear): Linear(in_features=64, out_features=64, bias=False)
    (k_linear): Linear(in_features=64, out_features=64, bias=False)
    (v_linear): Linear(in_features=64, out_features=64, bias=False)
  )
  (W1): Linear(in_features=64, out_features=64, bias=True)
  (W2): Linear(in_features=64, out_features=64, bias=True)
  (dropout_layer): Dropout(p=0, inplace=False)
  (layer_norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
  (item_bias): Embedding(2422, 1)
)
Test Before Training: (HR@10:0.0889,HR@5:0.0389,HR@50:0.5148,NDCG@10:0.0377,NDCG@5:0.0218,NDCG@50:0.1283)
Optimizer: Adam
all_losses: [8.246948   0.6951763  0.69312835]                                                      
Epoch 1     loss=8.2469 [6.1 s]  dev=(HR@5:0.1009,NDCG@5:0.0585) test=(HR@5:0.0618,NDCG@5:0.0358) [2.6 s] 
Save model to ../model/KDAPlus/KDAPlus_Office_2019_lr=0.001_l2=1...
all_losses: [7.984289  0.6748457 0.6929008]                                                         
Epoch 2     loss=7.9843 [6.3 s]  dev=(HR@5:0.1346,NDCG@5:0.0797) test=(HR@5:0.0950,NDCG@5:0.0568) [2.6 s] 
Save model to ../model/KDAPlus/KDAPlus_Office_2019_lr=0.001_l2=1...
all_losses: [7.848729   0.6667138  0.69158834]                                                      
Epoch 3     loss=7.8487 [6.2 s]  dev=(HR@5:0.2353,NDCG@5:0.1445) test=(HR@5:0.1704,NDCG@5:0.1037) [2.7 s]

Pursuerwener commented 1 month ago

Thank you very much for your reply. I have reconfigured the environment and reduced the python version, which is running normally. The original python is 3.9

ysh-1998 / LRD

When running these four commands, I encountered the same issue regarding tensor type mismatch. I feel like this is not caused by tensor type mismatch, but more likely due to the virtual environment configuration issue. #5

===================================== Arguments | Values

cuda devices: 1

params: 4336925