Closed bgonzalezfractal closed 1 year ago
Can you share the code of your custom Executor?
I'm sorry @JoanFM was not available for a while, this conversation occurred on Slack, for anyone facing issues with custom pytorch fine-tuned models uploaded to the hub, remember to apply this after the model downloads.
import torch
with torch.no_grad():
executor.encode(da,{})
Thils will avoid gradient calculation when generating embeddings and you are good to go.
Great results using finetuned models, we had a match consistency of 70% and now we are up 85-90% with finetuned models.
Describe the bug JINA Executor from the hub is inconsistent using different methods. Having a dataset with 8000 examples of stable diffusion prompts, the embeddings generated from:
Differ from:
The embeddings would be exactly the same for every item in the docarray but, with the first method we get embeddings filled zeros in some examples:
While the second method that is applying exactly the same model on exactly the same text, we get:
As you can see they are cleary different but they use the same executor, this could greatly change results while developing, any ideas?
-- UPDATE: I' ve also tried doing in batches of a 1000, it works, it seems the problem is encoding the whole thing ? Thought the batches logic is not that intuitive, any ideas? So far 2 methods have worked, the only difference is the volume being encoded with the executor.
Describe how you solve it I had to append DocumentArrays of length = 1 to encode them correctly.
Environment
Screenshots Uploaded