Closed kiranmaya closed 1 month ago
From human expert perpective, expert reads an image and write down struct information text down. The struct information text can be describe by a json string (maybe). then you make many many such images and json string. then you make a train_data.json, then you finetune the model....
i can make JSON in c# , I want to know structure for training data, I don't know the training data format .from what I understood , should I prepare , data like this ` { "messages": [ {"role": "system", "content": "system prompt here"}, { "role": "user", "content": [ {"type": "image", "image": "train_data/2.png"},
{"type": "text", "text": "Prompt for estimation here"}
]
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "estimated score is 456"}
]
}
]
}
`
Right.
BTW, there is a example training data file in the repo: https://github.com/zhangfaen/finetune-Qwen2-VL/blob/main/train_data/data.json fyi
hi, I have images like this ,after fine-tuning , if I give image later like this . I'm expecting it estimate score and reason how it is come its conclusion .how to prepare a dataset for this kind of problem .