kamalkraj / e5-mistral-7b-instruct

Finetune mistral-7b-instruct for sentence embeddings
Apache License 2.0
61 stars 13 forks source link

Best Practices for Fine-Tuning Models on Multi-Hop Datasets? #13

Open Leon-Sander opened 1 month ago

Leon-Sander commented 1 month ago

Hello, for my research I’m planning to fine-tune the model using the HoVer dataset, which includes queries that can involve up to 4 documents for verification. I have a question about setting up the training data for queries with multiple hops.

Do you know if each query with 'n' hops should include the given 'n' ground truth documents as positive examples and also 'n' negative examples for each of these queries?

If that's the way to go, could you tell me how the code could be adapted to manage these multiple examples?