请教关于更详细的实验参数配置

MrShouxingMa commented 1 year ago

非常不错的idea，并且验证其有效性! 但代码中的关键参数不只有学习率和正则化系数，如果想快速复现论文中的实验结果，还需要其他具体参数。请问是否可以提供每个数据集的mm_image_weight，knn_k，k，n_mm_layers和u_layers具体值？祝好！

hongyurain commented 1 year ago

You can refer to our paper in arxiv. The number of layers is indicated in Implementation Details. For mm_image_weight, you can refer to ablation study.

MrShouxingMa commented 1 year ago

非常感谢您的耐心解答，当时只看了idea觉得不错就去看代码了，结果代码要运行1650次搜索参数；再次拜读论文，确实文章中已经给出部分参数；为我的冒昧唐突提问向您道歉！祝作者大大paper 发发发！ d=====(￣▽￣*)b

MrShouxingMa commented 10 months ago

I'm so sorry to interrupt again, I'd also like to ask you for the detailed parameter settings for your experiment? I read the full text and made sure the dataset was accurate.

Taking the baby dataset as an example, I kept the overall DRAGON framework unchanged, with n_layers of 2, mm_image_weight of 0.1, aggr_mode of 10, n_mm_layers of 1, u_layers of 1, learning_rate of 0.0001, reg_weight of 0.001 and keep k searching in [30,40,50];

I tried this on multiple machines, but I was not able to run the results reported in your paper (the paper reports Recall@20 as 0.1021, but the best result I ran was 0.876, which is different from the results reported in your paper), but I was able to run the previous Xin Zhou WWW2023 BM3 results accurately.

Lastly, I would like to have a question for you, could you use the baby dataset as an example to confirm whether the parameters in the experiment are incorrect?

The specific experimental parameters are as follows，

gpu_id=0 use_gpu=True seed=[999] data_path=./data/ inter_splitting_label=x_label filter_out_cod_start_users=True is_multimodal_model=True checkpoint_dir=saved save_recommended_topk=True recommend_topk=recommend_topk/ embedding_size=64 epochs=1000 stopping_step=20 train_batch_size=2048 learner=adam learning_rate=[1e-06] learning_rate_scheduler=[1.0, 50] eval_step=1 training_neg_sample_num=1 use_neg_sampling=True use_full_sampling=False NEG_PREFIX=neg__ USER_ID_FIELD=userID ITEM_ID_FIELD=itemID TIME_FIELD=timestamp field_separator= metrics=['Recall', 'NDCG', 'Precision'] topk=[10, 20] valid_metric=Recall@20 eval_batch_size=8192 use_raw_features=False max_txt_len=32 max_img_size=256 vocab_size=30522 type_vocab_size=2 hidden_size=4 pad_token_id=0 max_position_embeddings=512 layer_norm_eps=1e-12 hidden_dropout_prob=0.1 end2end=False hyper_parameters=['mm_image_weight', 'k', 'u_layers', 'n_mm_layers', 'knn_k', 'aggr_mode', 'reg_weight', 'learning_rate', 'seed'] load_meta_cols=['itemID', 'description', 'title', 'category'] TEXT_ID_FIELD=description inter_file_name=baby14-indexed-v4.inter text_file_name=meta-games-indexed.csv img_dir_name=img vision_feature_file=image_feat.npy text_feature_file=text_feat-v1.npy user_graph_dict_file=user_graph_dict.npy feat_embed_dim=64 n_layers=2 mm_image_weight=[0.1] aggr_mode=['add'] knn_k=[10] k=[30, 40, 50] n_mm_layers=[1] u_layers=[1] reg_weight=[0.001] model=DRAGON dataset=baby valid_metric_bigger=True device=cuda

Thank you for your time! I am looking forward to hearing from you!

Best wishes,

Shouxing

hongyurain commented 10 months ago

Hi, in the experimental parameters settings you have shown, the learning_rate equals [1e-06]. But I check my log file, the learning_rate should be 0.0001. You can check the Hyper-parameter Sensitivity Study in my paper as shown below, the learning rate will influence the result. The result of your setting learning_rate=[1e-06] and reg_weight=[0.001] will get 0.883 in our experiments. You may try 0.0001 to see if the result will be correct. Thanks~

MrShouxingMa commented 10 months ago

Thanks for your answer, I re-ran the code with your configuration and found the results similar to the paper report.

One more point worth noting is that we should take the best results in the end based on validation set test results rather than test set results, specifically code： if best_test_upon_valid[val_metric] > best_test_value: best_test_value = best_test_upon_valid[val_metric]

What do you think? ^ _^

hongyurain commented 10 months ago

Hi, for your question, you can refer to trainer.py where best_test_upon_valid comes from. For each hyperparameter setting, we use valid set result to choose the best model parameter during training as the fit function in trainer.py shows. Then for the best model we choose from training, we save the best valid and test result. The code you refer to is using the best test result to choose the hyperparameter but not used for model training. You can refer to other papers that they manually key in the hyperparameter. For one hyperparameter set, they will use validation result during training to get the best model, then for the best model, they will evaluate to get the best test result. Compare the final result for each hyperparameter setting.

MrShouxingMa commented 10 months ago

OK, I see. Thank you again for your patient reply! Have a nice day!

hongyurain / DRAGON

请教关于更详细的实验参数配置 #3