Best Practices for Fine-Tuning Models on Multi-Hop Datasets?

Hello, for my research I’m planning to fine-tune the model using the HoVer dataset, which includes queries that can involve up to 4 documents for verification. I have a question about setting up the training data for queries with multiple hops.

Do you know if each query with 'n' hops should include the given 'n' ground truth documents as positive examples and also 'n' negative examples for each of these queries?

If that's the way to go, could you tell me how the code could be adapted to manage these multiple examples?

kamalkraj / e5-mistral-7b-instruct

Best Practices for Fine-Tuning Models on Multi-Hop Datasets? #13