Closed piotrmigdalek closed 3 months ago
Q1:
Your usage is correct; just ensure that the number of items in the prompts
and ground_truth
under each dimension, such as "english" and "polish," are consistent.
You can also check if the number of metrics recorded in the logs matches the number of input prompts.
Q2:
Hi, do you have any further questions?
Nothing as of now, thanks :)
Hi!
I'm trying to evaluate Mistral-7b based model with custom locality and portability data. For each of 50 edits I have 6 locality prompts and 2 portability ones.
How should I arange the dicts to feed them into an edit function in that case? Will the variable below feeded to portability_inputs work as intended?
And a technical one, are the metrics calculated after each edit? If yes, is there an option to evaluate everything on the final model after 50 sequential edits?
Thank you :)