scrosseye / ELLIPSE-Corpus

the English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus
9 stars 4 forks source link

ELLIPSE-Corpus

The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus

The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus is a freely available corpus of ~6,500 ELL writing samples that have been scored for overall holistic language proficiency as well as analytic proficiency scores related to cohesion, syntax, vocabulary, phraseology, grammar, and conventions. In addition, the ELLIPSE corpus provides individual and demographic information for the ELL writers in the corpus including economic status, gender, grade level (8-12), and race/ethnicity. The corpus provides language proficiency scores for individual writers and was developed to advance research in corpus and NLP approaches to assess overall and more fine-grained features of proficiency.

This repository contains the corpus (including text and average scores for reliable texts) and the scoring rubric for the final ELLIPSE corpus. A second file contains the entire corpus (~9,000 essays) and the individual ratings for each essay. Many of these essays were not included in the final corpus because they were found to not be reliable at the text or rater level.

The paper was published in 2023. Citation for the work is

Crossley, S. A., Tian, Y., Baffour, P., Franklin, A., Kim, Y., Morris, W., Benner, B., Picou, A., & Boser, U. (2023). Measuring second language proficiency using the English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus. International Journal of Learner Corpus Research, 9 (2), 248-269.

A pre-print of the paper can be found at https://zenodo.org/records/11217937

The data is provided under a CC BY-NC-SA 4.0 DEED Attribution-NonCommercial-ShareAlike 4.0 International license (https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)