Closed abaheti95 closed 4 years ago
We used same code for component prediction and relation classification
https://github.com/tuhinjubcse/AMPERSAND-EMNLP2019/blob/master/argmining/examples/run_classifier.py#L105 https://github.com/tuhinjubcse/AMPERSAND-EMNLP2019/blob/master/argmining/examples/run_classifier.py#L443 https://github.com/tuhinjubcse/AMPERSAND-EMNLP2019/blob/master/argmining/examples/run_classifier.py#L130
change line 443 to arg: 3 as arg component is claim/premise/none change line 130 to text_b = None , 131 to label= line[1]
@abaheti95 this was a slightly older version of hugging face when it wasn't called transformers but pytorch-pretrained bert. Tha classifier expects the same format as the current transformers package
Its sentence level classifier not BIO File format is tsv Sentence1\tLabel1 Sentence1\tLabel2 Sentence1\tLabel3
Ideally, you just need a dev.tsv and follow these steps https://github.com/yuzcccc/pytorch-pretrained-BERT
export GLUE_DIR=/path/to/data
python run_classifier.py \
--task_name ARG \
--do_eval \
--data_dir $GLUE_DIR/ \
--bert_model bert-base-uncased \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 2e-5 \
--output_dir /tmp/output/
https://huggingface.co/transformers/model_doc/bert.html#bertforsequenceclassification This is transformers new stuff
If you don't run into any type conflict I wonder if just loading the fine-tuned models state dictionary in latest transformer code works as well :) Just if you are more comfortable with latest transformer code
model_state_dict = torch.load('./models/pytorch_model.bin')
cache_dir = args.cache_dir if args.cache_dir else os.path.join(PYTORCH_PRETRAINED_BERT_CACHE, 'distributed_{}'.format(args.local_rank))
model = BertForSequenceClassification.from_pretrained(args.bert_model,state_dict= model_state_dict,num_labels = num_labels)
I am closing it , but please feel to reopen
Hi, I came across your EMNLP paper. I was hoping to test out your pretrained Argument Component BERT classifier on some other subreddits. Could you explain a little bit more as to how the component classification is working (asking for the results in Table 1 of the paper)? Specifically,
abc.txt
is the text file that contains one Reddit comment per line)?Eagerly waiting for your response.