Open jlerouge opened 10 months ago
You config shows the number of iterations set to 73k, but you're the samples you put above are at almost 500k iterations. Are you sure this is the right config getting used?
Oh, you're right, I haven't sent the right config file, but the diff doesn't really affect my issue I guess. Here is the correct configuration file : cf_dessurt_qs_dom_qa_fra_finetune.json
I have 735 training samples, 91 validation samples (and 91 test samples...). I plan on having more labeled samples but this is not ready yet.
I wonder if this training experiment is going somewhere near convergence or not. The loss still seems to be high during the last iterations... May I have success with even more iterations ? Maybe with a different learning rate strategy ?
And my main interrogation is : why the model seems to produce different results while training and in test phase ?
I believe the ground-truth label is used somehow as an input of the model during training to help the model producing
its text answer, but I can't prove it since I still lack a bit of understanding on what you've done. It may be related to the RUN
parameter and what you call "teacher-forcing". This is probably where the difference I see happens :
https://github.com/herobd/dessurt/blob/main/model/dessurt.py#L289
So, maybe my issue is that I lack of training samples, and the loss is still way too high.
Anyway, thanks for the hard work :+1:
You can train as long as the validation ED doesn't drop. 0.310 seems a bit high. You can use graph.py to see what it's doing. You're doing French documents? Dessurt was pre-trained on only English, so you may have a roadblock there. Particularly because the tokenization is English.
I'm not sure why using run.py would drop performance. Do you have any issues resuming training from a checkpoint?
Hi @herobd,
I'm trying to finetune a Dessurt model on my own VQA task (predicting a few fields on proof of address documents, like the name of the person, his/her address, city, zip code, ...).
I've set
"print_pred_every": 100
to control how the model behave during training phase. While not perfect, the model seems to give answers near the ground-truth in training phase, e.g.(...)
However, when using the latest weights in prediction mode using run.py script, I have very different results. For example, on the training sample with the right answer being "MONTIGNE LE BRILLANT" for the city question, here's the result.
The model clearly hallucinates an answer, and the output mask seems to be completely random output (the answer is located in the upper right corner, inside the address block).
I have no clue why it seems to produce better answers during training phase. Do you have any idea ?
I'm also sharing my configuration file for reference: cf_dessurt_qs_dom_qa_finetune.json
I would be thankful for any help on training dessurt :)