zetaalphavector / InPars

Inquisitive Parrots for Search
Apache License 2.0
177 stars 18 forks source link

Reproducing results from papers #21

Open orionw opened 1 year ago

orionw commented 1 year ago

Hi there! Great work - this is a very interesting line of research!

I was hoping to replicate your results on BEIR but seem to be having some trouble. For example, in both InPars v1 and v2 papers you mention using a learning rate of 1eāˆ’3, but I can't find any example scripts that use that (in legacy or otherwise, they seem to use 3e-4). When I use the hyperparameters from the papers (or the default example), I am getting much worse results.

I'm sure it's just some config that I'm missing from reading the papers/code, but if you happen to have the commands that reproduce the numbers in the paper I'd really appreciate it!

Thanks for your time!

lhbonifacio commented 1 year ago

Hey @orionw Thank you for your interest in our work! Could you give us more information about how are you trying to replicate the results? (the dataset you are using, are you generating new synthetic data or using the data we made available, are you fine-tuning/evaluating using TPU/GPU,....) And regarding the learning rate, we used 3e-4 (we are going to correct it).

Moreover, we are about to release a reproduction paper of InPars with further details on how to reproduce the results.

Thank you!

orionw commented 1 year ago

Thanks for the reply @lhbonifacio!

I've tried a couple datasets (SciFact, SciDocs) but can't reproduce it. I'm using GPUs and the code in inpars not in legacy. I am generating new questions using huggingface models (not the available InPars v1 questions and I haven't seen the InPars v2 generated questions publicly available).

I've tried several learning rates (including 3e-4) and optimizers but for both of them any amount of re-ranker fine-tuning on the synthetic docs makes the performance worse than just using castorini/monot5-3b-msmarco-10k without fine-tuning (and performance is worse than reported in the paper).

If you have the fine-tuning hyperparameters for any of the BEIR runs that would be great (optimizer, learning rate, scheduler, steps, etc.).

Obviously with non-determinism there will be randomness in the generated questions and in training, but I was hoping to minimize differences due to model training.

cramraj8 commented 1 year ago

Hi @orionw , I wonder if you fine-tune from the castorini/monot5-3b-msmarco-10kcheckpoint or from t5-base checkpoint. Any luck on sorting this out ?

orionw commented 1 year ago

Hi @cramraj8! I didn't use t5-base but I don't think they did either? I never did sort it out and moved on from this as it didn't seem like it would be released soon.

If they do (or you have time to figure it out), would love to see it be reproducible.

cramraj8 commented 1 year ago

@orionw Got it. I tried to generate unsupervised data by other tools, and in all cases the performance seem to drop for some cases.

cramraj8 commented 7 months ago

Hi @orionw , I did found out the reason behind the performance drop and proposed an effective solution in my recent NAACL paper. You can find it here - https://arxiv.org/pdf/2404.02489.pdf

orionw commented 7 months ago

Awesome @cramraj8! Thank you, I'm very excited to read the paper šŸ™