Closed youngjin-lee closed 11 months ago
For hugging face I usually refer to to the following for reproducibility:
I haven't looked into this too much but I am assuming some of this might also apply to SetFit
Hello!
I was able to reproduce your findings, and have applied a fix (https://github.com/huggingface/setfit/pull/439/commits/5b39f062d1f3c4b684703af389c88806931b0681) in preparation for the upcoming SetFit v1.0.0 release. It will resolve this issue. If you wish to already use it, you can install the bleeding-edge development version via:
pip install git+https://github.com/huggingface/setfit.git@v1.0.0-pre
See the preliminary docs for v1.0.0 here.
Can someone explain how to ensure reproducibility of a pre-trained model ("sentence-transformers/paraphrase-mpnet-base-v2")?
I thought that the result would be reproducible because SetFitTrainer() has a default random seed in its constructor, but found that it was not the case. SetFitTrainer source code indicates that "to ensure reproducibility across runs, I need to use [
~SetTrainer.model_init
] function to instantiate the model". But, I don't understand what it entails.Is there an example that I can follow?
Any help would be highly appreciated.
Thanks,