Closed mumu029 closed 4 months ago
This type of issue is difficult to diagnose at a distance. You already mentioned trying out different hyper-parameters without any improvement.
My biggest suspicion is indeed that the size of the training dataset is too small. Checking the original paper, the datasets they used for the experiments use tens of thousands of samples. Since you only have 42 samples, I would strongly consider if few shot prompting is not a better approach for your problem.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
System Info
When I used P-tuning V2 to fine-tune GLM, the loss reduction was very noticeable, but in the actual inference, I made a lot of noise. I use the training data again for inference. Examples of training data:
Let me add to my environment: I only had 42 training samples (I don't know if 42 samples with 100 virtual tokens makes sense, but I would run into the following problems if I made my virtual tokens smaller); In addition, during the fine-tuning process, I occasionally encountered the situation that grad norm was equal to nan (in this case, the loss may be directly equal to 0, or there is no effect). After meeting this situation, I either increased the num virtual token or fine-tuned from scratch, which could "solve" this unexpected problem. When I asked others, some said catastrophic forgetting, so I added common sense questions to the training set. Try again, it still doesn't work.
Who can help?
No response
Information
Tasks
examples
folderReproduction
train.py
eval.py
Expected behavior
Expectations: Advancements and Future Directions in Content-Based Image and Video Retrieval: A Comprehensive Review Actual: Gibberish