How does the Argument Component Classifier work?

abaheti95 commented 4 years ago

Hi, I came across your EMNLP paper. I was hoping to test out your pretrained Argument Component BERT classifier on some other subreddits. Could you explain a little bit more as to how the component classification is working (asking for the results in Table 1 of the paper)? Specifically,

what is the format of the output from the BERT Argument Component Classifier? Does it predict for each token like a BIO tagging format?
What files/commands do I need to run for a new text file (assume abc.txt is the text file that contains one Reddit comment per line)?

Eagerly waiting for your response.

tuhinjubcse commented 4 years ago

We used same code for component prediction and relation classification

https://github.com/tuhinjubcse/AMPERSAND-EMNLP2019/blob/master/argmining/examples/run_classifier.py#L105 https://github.com/tuhinjubcse/AMPERSAND-EMNLP2019/blob/master/argmining/examples/run_classifier.py#L443 https://github.com/tuhinjubcse/AMPERSAND-EMNLP2019/blob/master/argmining/examples/run_classifier.py#L130

change line 443 to arg: 3 as arg component is claim/premise/none change line 130 to text_b = None , 131 to label= line[1]

@abaheti95 this was a slightly older version of hugging face when it wasn't called transformers but pytorch-pretrained bert. Tha classifier expects the same format as the current transformers package

Its sentence level classifier not BIO File format is tsv Sentence1\tLabel1 Sentence1\tLabel2 Sentence1\tLabel3

Ideally, you just need a dev.tsv and follow these steps https://github.com/yuzcccc/pytorch-pretrained-BERT

export GLUE_DIR=/path/to/data

python run_classifier.py \
  --task_name ARG \
  --do_eval \
  --data_dir $GLUE_DIR/ \
  --bert_model bert-base-uncased \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --output_dir /tmp/output/

tuhinjubcse commented 4 years ago

https://huggingface.co/transformers/model_doc/bert.html#bertforsequenceclassification This is transformers new stuff

If you don't run into any type conflict I wonder if just loading the fine-tuned models state dictionary in latest transformer code works as well :) Just if you are more comfortable with latest transformer code

model_state_dict = torch.load('./models/pytorch_model.bin')
    cache_dir = args.cache_dir if args.cache_dir else os.path.join(PYTORCH_PRETRAINED_BERT_CACHE, 'distributed_{}'.format(args.local_rank))
    model = BertForSequenceClassification.from_pretrained(args.bert_model,state_dict= model_state_dict,num_labels = num_labels)

tuhinjubcse commented 4 years ago

I am closing it , but please feel to reopen

tuhinjubcse / AMPERSAND-EMNLP2019

How does the Argument Component Classifier work? #2