How to use this framework?

Here, I am documenting the steps required to run this framework based on flair for additional parameters

optional arguments:
  -h, --help            show this help message and exit
  --model_name_or_path MODEL_NAME_OR_PATH
  --output_path OUTPUT_PATH
                        path for saving training logs and the final model
  --pooling_type POOLING_TYPE
                        pooling type for the created embedding
  --dataset_name [DATASET_NAME [DATASET_NAME ...]]
                        please specify atleast one dataset
  --mini_batch_size MINI_BATCH_SIZE
  --max_epochs MAX_EPOCHS
  --learning_rate LEARNING_RATE
  --hidden_size HIDDEN_SIZE
  --label_type LABEL_TYPE
                        type of label eg ner (default)
  --downsample DOWNSAMPLE
                        Do you want to downsample the overall corpus (specify a float value eg 0.1)
  --do_full_train DO_FULL_TRAIN
                        Allow for changes in transformer weights including classifcation layer. If set to False it would allow only classification layer to be adjusted
  -c CUDA, --cuda CUDA  CUDA device
  --save_results SAVE_RESULTS
                        Do you want to save the results

Whenever the ner task is supposed to be run, make sure to run the file fine_tune_eval.py with param --label_type ner
Similarly, for the classification task, please run fine_tune_class.py with param --label_type class. Note: label_type is something you must confirm before running the script. Certain datasets eg, IMDB even though are text classification tasks, have label_type as sentiment. If you want to know how access label_type for a flair dataset before running the scriot, checkout this documentation https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_6_CORPUS.md

How to run multiple datasets for the evaluation

You can pass multiple datasets together as well eg NER_GERMAN_LEGAL, GERMEVAL_2018_OFFENSIVE_LANGUAGE along with the parameter --dataset_name followed by space-separated dataset names.

Since text classification can have multiple label types apart from just the standard class, to account for this behaviour whenever passing multiple datasets, be sure to pass the sequence of label types in the same order with --label_type param as well, eg

python fine_tune_eval_class.py --dataset_name GERMEVAL_2018_OFFENSIVE_LANGUAGE IMDB --label_type class sentiment

should be the order followed for multiple dataset and label type combination

Want to downsample the corpus?

Please pass in the parameter --downsample with a float value of how much how you want to downsample the corpus, eg, 0.2, it will downsample that percentage of value from each of the datasets passed in as a parameter.

How to save the results as JSON objects?

Pass in the parameter --save_results with the boolean value True. This will save the detailed results as json object in the same directory which you will specify as the output path for storing the weights and the final model.

Want to run only in Classifier Only Mode?

Incase you want to only fine-tune the linear layer at the top of the transformer and not the weights associated with the transformer pass in the parameter do_full_train as False. In some documentation, it is recommended to use crf and rnn in sequence tagger on top of the model is weight modification is disabled, however, for the mentioned tasks there was no difference in the final results.

Importance of Pooling Type

For some models eg BERT based models its fine to run the pooling with cls (default) pooling type. However, for other models such as gpt2, it is recommended to try out other pooling techniques such as mean or first_last to see considerable improvements in the fine-tuning task. For gpt2 based models code might break as well if you pass in an inappropriate pooling type.

malteos / finetune-evaluation-harness