Closed MosheWasserb closed 8 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Looks promising at a glance! cc @MoritzLaurer
Looks interesting and good to me. Would assume that this works less well for more complex tasks and the additional step of distilling from the setfit model takes more developer time, but overall a cool approach for further compressing the model and making things much more efficient for inference
Thanks @MoritzLaurer for the comments. We updated the notebook accordingly. Would you be interested in a joint post/blog or expanding your original blog with the MLP example?
Thanks @MoritzLaurer for the comments. We updated the notebook accordingly. Would you be interested in a joint post/blog or expanding your original blog with the MLP example?
Great! Don't have bandwidth for a joint blog atm unfortunately. Notebook LGTM @tomaarsen
Hi @tomaarsen I think we are good to go and merge into main. Would be great if you could also promote via LinkedIn.
Hi @tomaarsen Could you merge into main?
@MosheWasserb My apologies for the radio silence here, I was very busy with https://github.com/UKPLab/sentence-transformers/releases/tag/v2.6.0 Very impressive performance on this work.
Hi @tomaarsen Sure, no problem. Great work with the binary embeddings. Did you know that for SetFit I was able to compress a 768-vector size into 2 dim with no accuracy loss?
X_train = model.encode(x_train) X_eval = model.encode(x_eval)
estimator = PCA(n_components=2) estimator.fit(X_train)
X_train_em = estimator.transform(X_train) X_eval_em = estimator.transform(X_eval)
sgd = LogisticRegression() sgd.fit(X_train_em, y_train) y_pred_eval_sgd = sgd.predict(X_eval_em)
Thank you! PCA remains strong indeed, especially for classification. It doesn't work very well for retrieval however, there I've had more luck with 1. Matryoshka models and 2. Quantization to speed up the comparisons.
Adding a new notebook demonstrates Zero cost Zero time Zero shot Financial Sentiment Analysis From GPT4/Mixtral to MLP128K with SetFit @tomaarsen Could you also send to Moritz Laurer for review?