Closed kevinmartinjos closed 4 years ago
Hi, Thank you for your interest in the TK model! You are right, the readme is outdated - sorry about that :/ But i think we can fix it :)
Hope that helps, if you have any further questions, i am happy to help! Best, Sebastian
Hi Sebastian,
Thanks for the quick reply! I'll try this and get back to you :)
Hi, I am trying to reproduce the results on msmarco-passage, and I could not get
train.py
to run. Perhaps the readme is incomplete? What I've done so far:./generate_file_split.sh
on both training.tsv and top1000dev.tsv. I gave a separate output directory for both filestrain.py
seems to expect a single config file, but the configs that it accesses are spread over bothconfigs/models/model-config.yaml
andconfigs/datasets/tr-msmarco-passage.yaml
. So I concatenated the two files and used that as the config.expirement_base_path: "/GW/NeuralIR/nobackup/msmarco/experiments/" tqdm_disabled: False
Output directory of ./generate_file_split.sh for training.tsv
train_tsv: "/GW/NeuralIR/nobackup/msmarco/tk_output_dir/*"
validation_cont:
Output directory of ./generate_file_split.sh for top1000dev.tsv
tsv: "/GW/NeuralIR/nobackup/msmarco/tk_val_output_dir/*"
The dev qrel file supplied with msmarco
qrels: "/GW/NeuralIR/nobackup/msmarco/qrels.dev.tsv" candidate_set_from_to: [5,100]
How is this candidate set generated? I used https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md
candidate_set_path: "/GW/NeuralIR/nobackup/msmarco/run.dev.big.converted.tsv" save_only_best: True
Doesn't need it for the time being
test:
top1000:
tsv: "/data01/hofstaetter/data/msmarco-passage/test/dev.not-subset.bm25_plain_top1000-split6/*"
qrels: "/data01/hofstaetter/data/msmarco-passage/qrels/qrels.dev.tsv"
candidate_set_max: 1000
candidate_set_path: "/data01/hofstaetter/data/msmarco-passage/fs_results/plain_bm25_best_dev.not-subset_top1000.txt"
save_secondary_output: False
pre_trained_embedding_dim: 300 vocab_directory: "/GW/NeuralIR/nobackup/msmarco/vocab/"
pre_trained_embedding: "/GW/NeuralIR/nobackup/msmarco/glove.42B.300d.txt"
Deliberately setting it to a low number so that I can get to validation fast
validate_every_n_batches: 10 validation_cont_use_cache: True
token_embedder_type: "embedding" # embedding,fasttext,bert_cls train_embedding: True sparse_gradient_embedding: True
use_fp16: False
random_seed: 208973249 # real-random (from random.org)
This used to be set to TK_v6. I could not find that model in the code base.
model: "TK_v1" validation_metric: "MRR@10" optimizer: "adam"
default group (all params are in here if not otherwise specified in param_group1_names)
param_group0_learning_rate: 0.0001 param_group0_weight_decay: 0
param_group1_names: ["dense","position_bias","position_bias_absolute"] param_group1_learning_rate: 0.001 param_group1_weight_decay: 0
embedding_optimizer: "sparse_adam" embedding_optimizer_learning_rate: 0.0001 embedding_optimizer_momentum: 0.8 # only when using sgd
disable with factor = 1
learning_rate_scheduler_patience: 10 # * validate_every_n_batches = batch count to check learning_rate_scheduler_factor: 0.5
epochs: 1 batch_size_train: 32 batch_size_eval: 256
gradient_accumulation_steps: -1
early_stopping_patience: 35 # * validate_every_n_batches = batch count to check
max_doc_length: 200 max_query_length: 30
min_doc_length: -1 min_query_length: -1
secondary_output: top_n: 20
tk_att_heads: 10 tk_att_layer: 2 tk_att_proj_dim: 30 tk_att_ff_dim: 100
tk_kernels_mu: [1.0, 0.9, 0.7, 0.5, 0.3, 0.1, -0.1, -0.3, -0.5, -0.7, -0.9] tk_kernels_sigma: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
tk v6
tk_use_pos_agnostic: True tk_use_position_bias: True tk_use_diff_posencoding: True tk_position_bias_bin_percent: 0.2 tk_position_bias_absolute_steps: 4
knrm_kernels: 11
conv_knrm_ngrams: 3 conv_knrm_kernels: 11 conv_knrm_conv_out_dim: 128 # F in the paper
match_pyramid_conv_output_size : [16,16,16,16,16] match_pyramid_conv_kernel_size : [[3,3],[3,3],[3,3],[3,3],[3,3]] match_pyramid_adaptive_pooling_size: [[36,90],[18,60],[9,30],[6,20],[3,10]]
mv_lstm_hidden_dim: 32 mv_top_k: 10
pacrr_unified_query_length: 30 pacrr_unified_document_length: 200 pacrr_max_conv_kernel_size: 3 pacrr_conv_output_size: 32 pacrr_kmax_pooling_size: 5
salc_conv_knrm_kernels: 11 salc_conv_knrm_conv_out_dim: 128 salc_conv_knrm_dropi: 0 salc_conv_knrm_drops: 0 salc_conv_knrm_salc_dim: 300
salc_knrm_kernels: 11 salc_knrm_dropi: 0 salc_knrm_drops: 0 salc_knrm_salc_dim: 300
mm_light_kernels: 11