Open orionw opened 1 year ago
Hey @orionw Thank you for your interest in our work! Could you give us more information about how are you trying to replicate the results? (the dataset you are using, are you generating new synthetic data or using the data we made available, are you fine-tuning/evaluating using TPU/GPU,....) And regarding the learning rate, we used 3e-4 (we are going to correct it).
Moreover, we are about to release a reproduction paper of InPars with further details on how to reproduce the results.
Thank you!
Thanks for the reply @lhbonifacio!
I've tried a couple datasets (SciFact, SciDocs) but can't reproduce it. I'm using GPUs and the code in inpars
not in legacy
. I am generating new questions using huggingface models (not the available InPars v1 questions and I haven't seen the InPars v2 generated questions publicly available).
I've tried several learning rates (including 3e-4) and optimizers but for both of them any amount of re-ranker fine-tuning on the synthetic docs makes the performance worse than just using castorini/monot5-3b-msmarco-10k
without fine-tuning (and performance is worse than reported in the paper).
If you have the fine-tuning hyperparameters for any of the BEIR runs that would be great (optimizer, learning rate, scheduler, steps, etc.).
Obviously with non-determinism there will be randomness in the generated questions and in training, but I was hoping to minimize differences due to model training.
Hi @orionw , I wonder if you fine-tune from the castorini/monot5-3b-msmarco-10k
checkpoint or from t5-base
checkpoint. Any luck on sorting this out ?
Hi @cramraj8! I didn't use t5-base
but I don't think they did either? I never did sort it out and moved on from this as it didn't seem like it would be released soon.
If they do (or you have time to figure it out), would love to see it be reproducible.
@orionw Got it. I tried to generate unsupervised data by other tools, and in all cases the performance seem to drop for some cases.
Hi @orionw , I did found out the reason behind the performance drop and proposed an effective solution in my recent NAACL paper. You can find it here - https://arxiv.org/pdf/2404.02489.pdf
Awesome @cramraj8! Thank you, I'm very excited to read the paper š
Hi there! Great work - this is a very interesting line of research!
I was hoping to replicate your results on BEIR but seem to be having some trouble. For example, in both InPars v1 and v2 papers you mention using a learning rate of 1eā3, but I can't find any example scripts that use that (in legacy or otherwise, they seem to use 3e-4). When I use the hyperparameters from the papers (or the default example), I am getting much worse results.
I'm sure it's just some config that I'm missing from reading the papers/code, but if you happen to have the commands that reproduce the numbers in the paper I'd really appreciate it!
Thanks for your time!