Open Concyclics opened 2 days ago
If I remember (I need to check though) Batch size was specified to 4, learning rate was set to 1e-5 and we used some synthetic dataset in order to fine-tune the models. I will also give you one observation which I saw. These small models are not very much generalizable. PremSQL-1B was very much focussed on BirdBench, what we tried was generated some synthetic samples which was similar to BirdBench training data. Training with those gave a huge leap in the results.
As of now, the scripts for fine-tuning in PremSQL, might be bit buggy, and I am working on it. However the main sauce was the different datasets we used and also continual fine-tuning.
First and foremost, thank you for your outstanding work on this project. We‘d like to follow this work and fine-tune a model from deepseek-coder 1.3B by your datasets. But we cannot achieve a promising result. So may we get the fine-tuning settings like batch-size, learning-rate and other specification?