zhangfaen / finetune-Qwen2-VL

MIT License
218 stars 21 forks source link

Training Data Apporach #11

Closed kiranmaya closed 1 month ago

kiranmaya commented 1 month ago

hi, I have images like this Screenshot_91 ,after fine-tuning , if I give image later like this . Screenshot_85 I'm expecting it estimate score and reason how it is come its conclusion .how to prepare a dataset for this kind of problem .

zhangfaen commented 1 month ago

From human expert perpective, expert reads an image and write down struct information text down. The struct information text can be describe by a json string (maybe). then you make many many such images and json string. then you make a train_data.json, then you finetune the model....

kiranmaya commented 1 month ago

i can make JSON in c# , I want to know structure for training data, I don't know the training data format .from what I understood , should I prepare , data like this ` { "messages": [ {"role": "system", "content": "system prompt here"}, { "role": "user", "content": [ {"type": "image", "image": "train_data/2.png"},

                {"type": "text", "text": "Prompt for estimation here"}
            ]
        },
        {
            "role": "assistant",
            "content": [
                {"type": "text", "text": "estimated score is 456"}
            ]
        }
    ]
}

`

zhangfaen commented 1 month ago

Right.

BTW, there is a example training data file in the repo: https://github.com/zhangfaen/finetune-Qwen2-VL/blob/main/train_data/data.json fyi