wutaiqiang / GER-WSDM2023

The code for paper “Modeling Fine-grained Information via Knowledge-aware Hierarchical Graph for Zero-shot Entity Retrieval” in WSDM2023
6 stars 1 forks source link

I need to inquire about the issues encountered during training the model #1

Open qidandan opened 9 months ago

qidandan commented 9 months ago

The following are the parameters used during my training: --dataset_path data/zeshel \ --pretrained_model /work/users/qdd/bert-base-uncased/ \ --name ger_hgat \ --log_dir output/ger_hgat \ --mu 0.5 \ --epoch 10 \ --train_batch_size 32 \ --eval_batch_size 32 \ --encode_batch_size 128 \ --eval_interval 200 \ --logging_interval 10 \ --graph \ --gnn_layers 3 \ --learning_rate 2e-5 \ --do_eval \ --do_test \ --do_train \ --data_parallel \ --dual_loss \ --handle_batch_size 4 \ --return_type hgat When reproducing node_max_add, I only replaced the return_type with 'node_max_add,Everything else remains unchanged. The reproduction results show that node_max_add has a higher recall than HGAT I need to ask if there are any issues with the parameters I provided

wutaiqiang commented 9 months ago

could you please provide the specific data you get? for both node_max_add and hgat

qidandan commented 9 months ago

The following are the best results for HGAT with a recall of 64 +------------------+--------+--------+--------+--------+--------+--------+--------+--------+ | World | R@1 | R@2 | R@4 | R@8 | R@16 | R@32 | R@50 | R@64 | +------------------+--------+--------+--------+--------+--------+--------+--------+--------+ | forgotten_realms | 0.5308 | 0.6842 | 0.775 | 0.8358 | 0.8675 | 0.905 | 0.9183 | 0.9233 | | lego | 0.4304 | 0.5847 | 0.7064 | 0.7807 | 0.8299 | 0.8782 | 0.8932 | 0.9041 | | star_trek | 0.4183 | 0.5465 | 0.6354 | 0.7038 | 0.7559 | 0.8025 | 0.8216 | 0.8353 | | yugioh | 0.2976 | 0.4158 | 0.5065 | 0.5779 | 0.634 | 0.6817 | 0.7175 | 0.7371 | | total | 0.3925 | 0.5235 | 0.6172 | 0.6864 | 0.737 | 0.7831 | 0.8067 | 0.821 | +------------------+--------+--------+--------+--------+--------+--------+--------+--------+

The following are the best results for node_max_add with a recall of 64 +------------------+--------+--------+--------+--------+--------+--------+--------+--------+ | World | R@1 | R@2 | R@4 | R@8 | R@16 | R@32 | R@50 | R@64 | +------------------+--------+--------+--------+--------+--------+--------+--------+--------+ | forgotten_realms | 0.5767 | 0.73 | 0.7958 | 0.8442 | 0.8833 | 0.9083 | 0.9217 | 0.9333 | | lego | 0.4204 | 0.5972 | 0.7089 | 0.7807 | 0.8249 | 0.8632 | 0.8857 | 0.8982 | | star_trek | 0.4322 | 0.559 | 0.6473 | 0.7088 | 0.7596 | 0.8062 | 0.8306 | 0.8453 | | yugioh | 0.2979 | 0.4235 | 0.5083 | 0.5812 | 0.6363 | 0.6903 | 0.7241 | 0.7445 | | total | 0.4028 | 0.5384 | 0.6256 | 0.6906 | 0.7407 | 0.7862 | 0.8122 | 0.8282 | +------------------+--------+--------+--------+--------+--------+--------+--------+--------+

wutaiqiang commented 9 months ago

For hgat:

+------------------+--------+--------+--------+--------+--------+--------+--------+--------+ | World | R@1 | R@2 | R@4 | R@8 | R@16 | R@32 | R@50 | R@64 | +------------------+--------+--------+--------+--------+--------+--------+--------+--------+ | forgotten_realms | 0.5858 | 0.7475 | 0.8217 | 0.8667 | 0.9075 | 0.9292 | 0.94 | 0.9475 | | lego | 0.4429 | 0.6239 | 0.7481 | 0.8182 | 0.8641 | 0.8957 | 0.9108 | 0.9224 | | star_trek | 0.4653 | 0.5907 | 0.682 | 0.7438 | 0.7918 | 0.8335 | 0.8566 | 0.8666 | | yugioh | 0.3216 | 0.4624 | 0.5578 | 0.6328 | 0.6932 | 0.7418 | 0.7706 | 0.7881 | | total | 0.4286 | 0.5702 | 0.6648 | 0.73 | 0.7811 | 0.8215 | 0.8441 | 0.8565 | +------------------+--------+--------+--------+--------+--------+--------+--------+--------+

for node_max_add +------------------+--------+--------+--------+--------+--------+--------+--------+--------+ | World | R@1 | R@2 | R@4 | R@8 | R@16 | R@32 | R@50 | R@64 | +------------------+--------+--------+--------+--------+--------+--------+--------+--------+ | forgotten_realms | 0.5392 | 0.7083 | 0.8033 | 0.8492 | 0.8933 | 0.9192 | 0.9367 | 0.9392 | | lego | 0.4295 | 0.6105 | 0.7406 | 0.8098 | 0.8649 | 0.8991 | 0.9108 | 0.9208 | | star_trek | 0.4135 | 0.5434 | 0.6428 | 0.7102 | 0.7708 | 0.8126 | 0.8363 | 0.8495 | | yugioh | 0.2771 | 0.4232 | 0.5267 | 0.6153 | 0.6832 | 0.7395 | 0.7742 | 0.7887 | | total | 0.3845 | 0.5307 | 0.6346 | 0.7068 | 0.7672 | 0.8111 | 0.8363 | 0.8483 | +------------------+--------+--------+--------+--------+--------+--------+--------+--------+

wutaiqiang commented 9 months ago

The following are the parameters used during my training: --dataset_path data/zeshel --pretrained_model /work/users/qdd/bert-base-uncased/ --name ger_hgat --log_dir output/ger_hgat --mu 0.5 --epoch 10 --train_batch_size 32 --eval_batch_size 32 --encode_batch_size 128 --eval_interval 200 --logging_interval 10 --graph --gnn_layers 3 --learning_rate 2e-5 --do_eval --do_test --do_train --data_parallel --dual_loss --handle_batch_size 4 --return_type hgat When reproducing node_max_add, I only replaced the return_type with 'node_max_add,Everything else remains unchanged. The reproduction results show that node_max_add has a higher recall than HGAT I need to ask if there are any issues with the parameters I provided

Your batch size should be 128 rather than 32 --train_batch_size 128 \ --eval_batch_size 128 \

qidandan commented 9 months ago

The GPU has a memory size of 16GB,Due to limited GPU memory, the batch size can only be set to 32

wutaiqiang commented 9 months ago

You can try to set --gradient_accumulation to 4, it somehow may work but not guarantee.

wutaiqiang commented 9 months ago

The loss function is similar to the contrastive loss and the batch size really matters.

qidandan commented 9 months ago

Can you provide the model that you have trained?

wutaiqiang commented 9 months ago

Let me try, but this work was done during internship in Tencent, and I need to ask for permission to release the weights.

qidandan commented 9 months ago

Ok ,thank you very much.