Hi Mike — hope everyone is feeing better! Here is an initial draft of some analysis that could be used in the submission to describe my openai fine tuning experiments and results. Let me know how it can be improved / anything else that I should include. Thank you!
Model’s prior exposure to latin:
We wanted to test the ability of our fine-tuning methods to teach the model a new language, so we deliberately selected models that were not explicitly trained on Latin. Thus, we chose the base models LLaMA and Davinci. LLaMA is not trained on Latin at all and thus the finetuning is teaching it Latin entirely from scratch. Davinci has some Latin exposure, as some of the text that it was trained on includes Latin. For example, English Wikipedia has a few articles about Latin that include Latin text. Further, because of the training process of Davinci, we know that our evaluation data is not in the training set they are from a Latin textbook that would not be included in the Common Crawl or any of the other training data sets OpenAI utilized.
(I also wondering how we would know that it is not in Books1/Books2)
Graph analysis:
We modeled our fine-tuning methods similarly to how a human would study. The first fine-tuning is solely on the raw text of the textbook which is comparable to a student reading the textbook without any additional studying. The second method is more similar to self quizzing while studying, as that training data is generated through taking a prompt and creating questions by replacing common Latin suffixes and random words for every word in the chapter. As the prompts are included in that fine-tuning, that process therefore also serves to instruction tune the model. In the future, we will include more experiments like going through the textbook chapter by chapter, also similar to how a student would learn.
[ FIGURE #: graph of 0 and 1 shot analysis of the base model and two methods of fine-tuning performance on ideally chapters 1-5, analysis of the graph ]
As shown in FIGURE #, for davinci, the fine tuned models outperform the base model. The second method fine-tuned model does the best, likely because it has been instruction tuned but it also has domain knowledge. The first fine-tuned method performs better than the base model due to the increase in Latin knowledge. It makes sense that the base davinci model would score close to zero for many of the trials, as because although it has some Latin exposure, it doesn’t have enough knowledge about the intricacies of the language to score well. However, this previous knowledge may make it easier to fine-tune, as LLAMA preforms worse with the same fine-tuned data.
Specific quirks about Latin:
Latin is a unique language due to various linguistic attributes it possesses. Latin is a highly inflected language, which means there is a complex verb conjugation system that includes various classes for tense, voice, mood, person, and number. Latin has six grammatical cases, which are various forms of nouns, adjectives, and pronouns which determine the word’s function within a sentence. Due to that case system, there is a far less rigid word order than English does, as case changes serve as the English equivalent of prepositions like "to" or "for." However, a sentence generally should follow a subject-object-verb pattern. In addition, Latin deals with vowels in a distinct way, as the presence or absence of a macron over a vowel can alter the meaning of the word completely. This is also a direct result from the case system, for example, the addition of a macron from "villa" to "villā" changes the meaning from "the house" to "at the house." There are also additional grammatical constructions that add nuance to Latin grammatical structure. For example, ablative absolutes are formed by a noun and a pronoun in the ablative case which illustrates a circumstance that is independent of the main clause. The subjunctive in Latin is also more widely used than in other languages. Compared to English, Latin uniquely uses the subjunctive to indicate indirect statements and the subjunctive is used within subordinate clauses to express purpose, consequences, and other concepts. In addition, because of the complex way that Latin tenses are sequenced, the subjunctive can be used to show the relationship between the main and subordinate clauses of a sentence.
Hi Mike — hope everyone is feeing better! Here is an initial draft of some analysis that could be used in the submission to describe my openai fine tuning experiments and results. Let me know how it can be improved / anything else that I should include. Thank you!
Model’s prior exposure to latin: We wanted to test the ability of our fine-tuning methods to teach the model a new language, so we deliberately selected models that were not explicitly trained on Latin. Thus, we chose the base models LLaMA and Davinci. LLaMA is not trained on Latin at all and thus the finetuning is teaching it Latin entirely from scratch. Davinci has some Latin exposure, as some of the text that it was trained on includes Latin. For example, English Wikipedia has a few articles about Latin that include Latin text. Further, because of the training process of Davinci, we know that our evaluation data is not in the training set they are from a Latin textbook that would not be included in the Common Crawl or any of the other training data sets OpenAI utilized. (I also wondering how we would know that it is not in Books1/Books2)
Graph analysis: We modeled our fine-tuning methods similarly to how a human would study. The first fine-tuning is solely on the raw text of the textbook which is comparable to a student reading the textbook without any additional studying. The second method is more similar to self quizzing while studying, as that training data is generated through taking a prompt and creating questions by replacing common Latin suffixes and random words for every word in the chapter. As the prompts are included in that fine-tuning, that process therefore also serves to instruction tune the model. In the future, we will include more experiments like going through the textbook chapter by chapter, also similar to how a student would learn. [ FIGURE #: graph of 0 and 1 shot analysis of the base model and two methods of fine-tuning performance on ideally chapters 1-5, analysis of the graph ] As shown in FIGURE #, for davinci, the fine tuned models outperform the base model. The second method fine-tuned model does the best, likely because it has been instruction tuned but it also has domain knowledge. The first fine-tuned method performs better than the base model due to the increase in Latin knowledge. It makes sense that the base davinci model would score close to zero for many of the trials, as because although it has some Latin exposure, it doesn’t have enough knowledge about the intricacies of the language to score well. However, this previous knowledge may make it easier to fine-tune, as LLAMA preforms worse with the same fine-tuned data.
Specific quirks about Latin: Latin is a unique language due to various linguistic attributes it possesses. Latin is a highly inflected language, which means there is a complex verb conjugation system that includes various classes for tense, voice, mood, person, and number. Latin has six grammatical cases, which are various forms of nouns, adjectives, and pronouns which determine the word’s function within a sentence. Due to that case system, there is a far less rigid word order than English does, as case changes serve as the English equivalent of prepositions like "to" or "for." However, a sentence generally should follow a subject-object-verb pattern. In addition, Latin deals with vowels in a distinct way, as the presence or absence of a macron over a vowel can alter the meaning of the word completely. This is also a direct result from the case system, for example, the addition of a macron from "villa" to "villā" changes the meaning from "the house" to "at the house." There are also additional grammatical constructions that add nuance to Latin grammatical structure. For example, ablative absolutes are formed by a noun and a pronoun in the ablative case which illustrates a circumstance that is independent of the main clause. The subjunctive in Latin is also more widely used than in other languages. Compared to English, Latin uniquely uses the subjunctive to indicate indirect statements and the subjunctive is used within subordinate clauses to express purpose, consequences, and other concepts. In addition, because of the complex way that Latin tenses are sequenced, the subjunctive can be used to show the relationship between the main and subordinate clauses of a sentence.