mikeizbicki / modulus-magnus-linguae

8 stars 6 forks source link

Description of Dataset Draft #45

Open sophiahuangg opened 1 year ago

sophiahuangg commented 1 year ago

Dataset

We used a Latin textbook that can be found on https://exercitia-latina.surge.sh/ , which contains 35 chapters with each chapter containing a series of Latin quizzes, separated by quiz style. The quizzes are labeled as “Exercise” or “PENSVM”. In total, there are 521 different quizzes throughout all 35 chapters. Within each quiz set is a series of questions that tests the reader’s knowledge of Latin. The quiz styles differ from grammar questions, Latin comprehension questions, and free response questions. For our paper, we consider close-ended questions - i.e. if the answer is correct, then the model provides a score of 1, and if the answer is incorrect, the model provides a score of 0. We are not yet covering open-ended questions because for open-ended questions, we no longer score the answer using binary numbers. In the future, we are looking to use the implementation of a BLEU score to test how accurate our model predicts the answer for open-ended questions.