xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.
233 stars 9 forks source link

loss curve of SFT on vicuna-7b #9

Open Xiaohui9607 opened 2 months ago

Xiaohui9607 commented 2 months ago

Hi, I got a training curve like this, is it normal? Do you mind sharing your trainer_state.json? thx!

image
xiaoachen98 commented 2 months ago

Hi, I got a training curve like this, is it normal? Do you mind sharing your trainer_state.json? thx! image

Yes, it's quite normal. What about the benchmark performance?

Xiaohui9607 commented 2 months ago

Hi, I got a training curve like this, is it normal? Do you mind sharing your trainer_state.json? thx! image

Yes, it's quite normal. What about the benchmark performance?

Yeah I think in terms of all benchmark (VQA), it can reproduce. The only thing is that in terms of image captioning task, it tends to generate caption with less details. For example, given an same image,

Your uploaded weight: The image shows a black bear in its natural habitat, which appears to be a forested area. The bear is standing on all fours, with its head lowered towards the ground, possibly sniffing or foraging for food. The bear's fur is predominantly black, which is characteristic of the species, and it has a distinctive white patch on its chest. The bear's posture and the environment suggest that it is engaged in typical bear behavior, such as searching for food or exploring its surroundings. There are no visible signs of human interaction or disturbance in the image, indicating that the bear is in a relatively undisturbed natural setting.

My own weight: The image shows a black bear walking through a wooded area.

I am not sure what's causing this

xiaoachen98 commented 2 months ago

Hi, I got a training curve like this, is it normal? Do you mind sharing your trainer_state.json? thx! image

Yes, it's quite normal. What about the benchmark performance?

Yeah I think in terms of all benchmark (VQA), it can reproduce. The only thing is that in terms of image captioning task, it tends to generate caption with less details. For example, given an same image,

Your uploaded weight: The image shows a black bear in its natural habitat, which appears to be a forested area. The bear is standing on all fours, with its head lowered towards the ground, possibly sniffing or foraging for food. The bear's fur is predominantly black, which is characteristic of the species, and it has a distinctive white patch on its chest. The bear's posture and the environment suggest that it is engaged in typical bear behavior, such as searching for food or exploring its surroundings. There are no visible signs of human interaction or disturbance in the image, indicating that the bear is in a relatively undisturbed natural setting.

My own weight: The image shows a black bear walking through a wooded area.

I am not sure what's causing this

Are the prompts the same? That's so wired.

Xiaohui9607 commented 2 months ago

I am using "v1" prompt for training and inference. Did you use the same?

xiaoachen98 commented 2 months ago

I am using "v1" prompt for training and inference. Did you use the same?

Yeah. I set the conv mode as "v1" for vicuna-7b too.

hkunzhe commented 2 months ago

Hi, I got a training curve like this, is it normal? Do you mind sharing your trainer_state.json? thx! image

Yes, it's quite normal. What about the benchmark performance?

the training loss is converged/fluctuated under 1?