Open wangjiyang opened 1 year ago
Hey @wangjiyang , thanks, glad it's useful!
The finetuning process for the starcoder-15b model took 15h30m on an 8xA100-80GB instance. I was able to get that instance for 12$/h, so that's a total cost of 192$ if you're being very efficient with your time management while renting the instance.
In reality, downloading the base-model, setting up the dataset, debugging some test-finetunes with few steps to make sure there are no memory issues, and uploading the final finetune added a couple extra hours on top of that, so i'd calculate a bit more depending on how long that takes you.
If your GPU cloud provider supports creating a template, then it may make the process cheaper. Prepare your template without GPU (if possible) or with a cheap one. The template should include all model data cached, so nothing big needs to be downloaded when the actual (expensive) training instance is created with all the GPUs. This approach can save most of the setup time, but cannot help with the memory related issues which need to be debugged with all the GPUs present.
Thanks @minosvasilias @viktor-ferenczi , your information are very helpful. I read some paper and find someone tried to use AST(abstract syntax tree) to improve model quality on coding generation. Do you have any expertise on this approach?
Yes, I have. In the AskYourCode ChatGPT plugin I was doing exactly that. Please feel free to message me directly in Discord, invite is on the https://askyourcode.ai page.
Good point on the templates @viktor-ferenczi .
No personal experience with the syntax trees, but also interested in finding out more.
Hi. Thank you for your great work. You approach is helpful to me. I am trying to fine-tune starcoder to enhance its C code performance. So your cost of fine-tune starcoder is helpful to me. Could you share this information? Thanks.