shlomihod / deep-text-eval

Differnable Readability Measure Regularizer for Neural Network Automatic Text Simplification
24 stars 7 forks source link

Rule-based and machine learning approaches for second language sentence-level readability #7

Closed shlomihod closed 6 years ago

shlomihod commented 6 years ago

http://www.aclweb.org/anthology/W14-1821

vageeshSaxena commented 6 years ago

Reading.

vageeshSaxena commented 6 years ago

1) Objective : A) Identification of sentences understandable by second language learners of Swedish, which can be used in automatically generated exercises based on corpora. B) How to exploit existing Natural Language Processing (NLP) tools to assess the suitability of the available corpus. 2) Findings : A) Out of a number of deep linguistic indicators explored, mainly lexical-morphological and semantic features are found informative for second language sentence-level readability. B) Classification accuracy of 71%. C) Top 10 informative features:

            Rank    Feature-ID      Weight
            1       DiffW%          0.576
            2       Sense/W         0.438
            3       DiffWs          0.422
            4       SentLen         0.258
            5       Mod             0.223
            6       KellyFr         0.215
            7       NomR            0.132
            8       AdvVar          0.114
            9       Ddep/SentLen    0.08
            10      DeepDep         0.08

3) Features considered : Refer figure 2. 4) Dataset : Level Source Nr. sentences A) Within B1 B1 (CEFR) texts 2358 B) Above B1 B2 (CEFR) texts 795 C) Above B1 Korp corpora 1528 D) Total size of dataset 4681 5) Method : Supervised Classification (Linear Support Vector Machine (SVM) classifier). 6) Evaluation was carried out using 10-fold cross-validation, i.e. the proportion of labels in each fold was kept the same as that in the whole training set during the ten iterations of training and testing. 7) Results :

A) On all the 28 features

Classifier Acc F1 B1-Prec B1-Recall Baseline 0.50 0.66 0.50 1.00 SVM 0.71 0.70 0.73 0.68

B) On seperate feature groups Feature group Acc F1 (Nr of features)

Traditional 0.59 0.55 Syntactic 0.59 0.54 Lexical 0.70 0.70 Semantic 0.61 0.55