minosvasilias / godot-dodo

Finetuning large language models for GDScript generation.
MIT License
523 stars 25 forks source link

Exotic codestyle #7

Open Kelin2025 opened 1 year ago

Kelin2025 commented 1 year ago

In my game, I wrote a library that allows me to build logic by composing objects, so I could customize it and subscribe to any step
Then I wrote higher-level presets and operators to describe character's skills and perks So the code mostly looks like this:

Actions
Perks
How presets/operators are made

I understand that this approach is kinda different from how people usually write code (and without explanation might be frustrating even for human haha), so the question is - what do you think, can fine-tuning with common data help me get better predictions for this approach?

minosvasilias commented 1 year ago

That looks interesting! I'd say for cases like this where existing code utilizing that structure only exists in a single private codebase, finetuning is probably not the ideal way to go about it.

Instead, i'd suggest either using embeddings to retrieve relevant examples to inject into prompts (llama_index-style) or just dumping as much example code as possible into the prompt as context for the model to follow. For the latter, larger context length for the used model would be very beneficial.

Both could still be done on top of a GDScript-finetuned model like godot-dodo of course. But larger GPT models might perform better at zero-shot learning as would be required in this case.

SeanR26 commented 1 year ago

I believe this is the answer to my question as well. I work on a medical application that has its own scripting language. However it is not documented on GitHub and I was wondering how to use the same technique here to find a way to simplify rule generation using this scripting language. I have script examples in text files and pdf documents, I guess that embedding makes the most sense if I am reading your response correctly. However given the relatively low number of script lines in this documentation not sure if it will actually be worth the effort

minosvasilias commented 1 year ago

@SeanR26 It's difficult to say without knowing the exact data available of course, but i'd suggest simply playing around with some existing APIs (OpenAI primarily, but also relevant Huggingface/Google Colab ones) and pasting a decent chunk of existing code into your prompt as context. Should be very easy and give you some sense on how well these models perform at in-context learning regarding your specific data.