transducens / demint

Repository for the project "DeMINT: Automated Language Debriefing for English Learners via AI Chatbot Analysis of Meeting Transcripts"
Apache License 2.0
3 stars 0 forks source link

Análisis de datasets relacionados con la corrección de errores gramaticales #14

Closed levnikolaevich closed 1 month ago

levnikolaevich commented 2 months ago

Análisis de datasets relacionados con la corrección de errores gramaticales, con el objetivo de seleccionar una estrategia de fine-tuning de modelos de lenguaje

https://ilexir.co.uk/datasets/index.html https://www.comp.nus.edu.sg/~nlp/corpora.html https://www.cl.cam.ac.uk/research/nl/bea2019st/ https://github.com/snukky/wikiedits

levnikolaevich commented 2 months ago

Datasets: https://colab.research.google.com/drive/1e3iAvTHvxBLYtYsBYeZ5eryZNHpFw7zU?usp=sharing

levnikolaevich commented 2 months ago

[NUS Natural Language Processing Group](https://www.comp.nus.edu.sg/~nlp/index.html)

1. Ten Sets of Multiply Annotated Essays for Grammatical Error Correction Corpus There are only 50 essays here, and they are annotated by different instructors. It seems this is not very suitable for our task. In our case, a clear correspondence between the error and its category is preferable.

2. NUS Corpus of Learner English (NUCLE) Data request needed https://sterling8.d2.comp.nus.edu.sg/nucle_download/nucle.php