Open Alex357853 opened 1 year ago
Hi,
I'm not able to get your results; I was able to get very close numbers to the ones reported in the Github repo just now (see screenshot with timestamps below).
One thing that comes to mind is that it is important to use the exact environment (following the package versions specified in requirement.txt
). We have found that different torch or huggingface versions leads to some variation across all experiments. Different hardware configurations could also result in some minor discrepancies - this run is on one V100 GPU.
It is also possible that the uploaded code / model download links are not correct, as I have done some refactoring after the initial upload. I can check this more thoroughly over the weekend.
Hi,
Thank you for your prompt response. I notice that in your screenshot, the number of examples is 193651, while the number of examples in the provided nli-dataset.csv file is 1936511. After implementing the code in eval.py file,
eval_dataset = load_dataset(
"csv",
data_files=args.eval_data_path,
split="train[:-10%]",
cache_dir=args.cache_dir,
)
it shows 05/11/2023 14:16:12 - INFO - __main__ - Number of examples: 1742861
, which is larger than 193651.
I am wondering whether you are using a different dataset. If so, could you please provide the link? Additionally, I am also wondering why you are evaluating only a portion of the data rather than the entire dataset in this code. Thank you for taking the time to assist me.
Hi @Alex357853, could you share the nli-dataset.csv file? Thank you!
Hi,
For the Extrinsic Benchmark: Natural Language Inference, I downloaded your provided
princeton-nlp/mabel-bert-base-uncased
checkpoint --mabel_checkpoint_best.pt
model and ran the Evaluation script you provided using the command:python eval.py --model_name_or_path bert-base-uncased --load_from_file nli-mabel/mabel_checkpoint_best.pt --eval_data_path bias-nli/nli-dataset.csv
. However, I did not achieve the same high results as reported by you:My results were as follows:
I was wondering if there might be a mistake in my implementation or if others have reported similar results. Thank you for taking the time to read my message and any help you could offer would be greatly appreciated.