malteos / finetune-evaluation-harness

MIT License
2 stars 0 forks source link

Initial Results (Classification and NER tasks) #3

Open akash418 opened 1 year ago

akash418 commented 1 year ago

Model Fine-tuned: https://huggingface.co/bert-base-german-cased

Task: GERMEVAL_2018_OFFENSIVE_LANGUAGE : Type: Classification (Full Model Fine Tuning)

Results:

By class:
              precision    recall  f1-score   support

       OTHER     0.7896    0.8502    0.8188      2330
     OFFENSE     0.6588    0.5607    0.6058      1202

    accuracy                         0.7517      3532
   macro avg     0.7242    0.7055    0.7123      3532
weighted avg     0.7451    0.7517    0.7463      3532

Task: GERMEVAL_2018_OFFENSIVE_LANGUAGE : Type: Classification (Classifer Only Tuning)

Results:


By class:
              precision    recall  f1-score   support

       OTHER     0.7919    0.8266    0.8089      2330
     OFFENSE     0.6327    0.5790    0.6047      1202

    accuracy                         0.7424      3532
   macro avg     0.7123    0.7028    0.7068      3532
weighted avg     0.7378    0.7424    0.7394      3532

Settings:

Task: NER_GERMAN_LEGAL: Type NER (Full Model Fine Tuning)

Results:

By class:
              precision    recall  f1-score   support

          GS     0.9779    0.9852    0.9815      1886
          RS     0.9760    0.9760    0.9760      1249
         GRT     0.9939    0.9879    0.9909       331
         LIT     0.9391    0.9544    0.9467       307
          VT     0.9386    0.9549    0.9466       288
         INN     0.9315    0.9231    0.9273       221
         PER     0.9579    0.9333    0.9455       195
          LD     1.0000    0.9869    0.9934       153
         EUN     0.9110    0.9433    0.9268       141
          RR     0.9922    1.0000    0.9961       127
         ORG     0.8000    0.7805    0.7901       123
          UN     0.9478    0.9646    0.9561       113
          VO     0.8696    0.9412    0.9040        85
          ST     0.9383    0.9870    0.9620        77
          VS     0.8209    0.8594    0.8397        64
         MRK     0.7895    0.8571    0.8219        35
         STR     0.7895    0.7500    0.7692        20
         LDS     0.7778    1.0000    0.8750        14
          AN     0.9167    1.0000    0.9565        11

   micro avg     0.9591    0.9660    0.9625      5440
   macro avg     0.9088    0.9360    0.9213      5440
weighted avg     0.9595    0.9660    0.9626      5440

Task: NER_GERMAN_LEGAL: Type NER (Classifier Only Tuning)

Results:

By class:
              precision    recall  f1-score   support

          GS     0.9847    0.9857    0.9852      1823
          RS     0.9609    0.9723    0.9666      1338
         LIT     0.9011    0.9335    0.9170       361
         GRT     0.9691    0.9874    0.9782       318
         INN     0.9071    0.9421    0.9242       259
          VT     0.9424    0.9622    0.9522       238
         EUN     0.8659    0.9221    0.8931       154
         PER     0.9419    0.9359    0.9389       156
          LD     0.9342    0.9530    0.9435       149
          RR     1.0000    1.0000    1.0000       136
         ORG     0.8702    0.8636    0.8669       132
          UN     0.9018    0.9806    0.9395       103
          VO     0.9028    0.8667    0.8844        75
          ST     0.9028    0.9420    0.9220        69
          VS     0.6400    0.7111    0.6737        45
         MRK     0.8710    0.7714    0.8182        35
         STR     0.7586    0.9167    0.8302        24
         LDS     0.6250    0.6250    0.6250        16
          AN     0.8571    1.0000    0.9231         6

   micro avg     0.9482    0.9619    0.9550      5437
   macro avg     0.8809    0.9090    0.8938      5437
weighted avg     0.9489    0.9619    0.9552      5437

Settings:

Model Fine-tuned: https://huggingface.co/malteos/gpt2-wechsel-german-ds-meg

Task: GERMEVAL_2018_OFFENSIVE_LANGUAGE : Type: Classification (Full Model Fine Tuning)

Results:

Settings:

Model-Fine Tuned https://huggingface.co/malteos/gpt2-xl-wechsel-german

Task: GERMEVAL_2018_OFFENSIVE_LANGUAGE : Type: Classification (Full Model Fine Tuning)

Results:

Task: GERMEVAL_2018_OFFENSIVE_LANGUAGE : Type: Classification (Classifier Only Tuning)

Results:



Settings:
- Transformer Document Embeddings
- Pooling: mean
- epochs: 100
- learning rate: 3e-5
- default batch size and hidden size

> The results are comparable if not higher than the ones mentioned here: https://github.com/stefan-it/flair-experiments/tree/master/germeval2018 and here: https://www.dfki.de/fileadmin/user_upload/import/10977_LREC-2020-Leitner-et-al-final.pdf
malteos commented 1 year ago

Great job!

Few comments:

akash418 commented 1 year ago

Great job!

Few comments:

  1. For the classification task, yes I fine-tuned it for 100 epochs. I wanted to see if there is a point during the epoch up to which the loss went on decreasing. By running it up to 100 epochs, I was sure that after about 60 epochs, the loss was not decreasing, so the performance we have is the best one for this set of parameters.
  2. Yes I would do that and document the results here for comparison.
  3. I used 1 RTXA6000 GPU with a maximum of 89 GB of memory allocated to GPU, batch size 32, and hidden size 32. The best option is to try and decrease the batch size to 8 and try and see if it works. In worst case I will work with the smaller model.
malteos commented 1 year ago

I used 1 RTXA6000 GPU with a maximum of 89 GB of memory allocated to GPU, batch size 32, and hidden size 32. The best option is to try and decrease the batch size to 8 and try and see if it works. In worst case I will work with the smaller model.

You can even decrease the batch size to 1. Generally, please try to not always use the big GPUs on the cluster. For example, the RTX6000 should be totally sufficient.