plkmo / BERT-Relation-Extraction

PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper
Apache License 2.0
565 stars 132 forks source link

Not able to run main_task.py on SemEval2010 Task 8 dataset #26

Closed sayanb-7c6 closed 3 years ago

sayanb-7c6 commented 3 years ago

Hi,

I was trying to fine tune the model on SemEval2010 Task 8 dataset on google colab. I tried the following code

!rm -rf ./data
!mkdir ./data
!unzip <path1>/SemEval2010_task8_all_data.zip -d ./data/

!python <path2>/BERT-Relation-Extraction/main_task.py  \
--train_data ./data/SemEval2010_task8_all_data/SemEval2010_task8_training/TRAIN_FILE.TXT \
--test_data ./data/SemEval2010_task8_all_data/SemEval2010_task8_testing/TEST_FILE.txt \
--num_classes 9  \
--batch_size 100 \
--num_epochs 3 \
--lr 0.0001 \
--model_no 0 \
--model_size bert-base-uncased \
--train 1

But this throws an error, saying:

Traceback (most recent call last):
  File "<whatever>/BERT-Relation-Extraction/main_task.py", line 50, in <module>
    net = train_and_fit(args)
  File "<whatever>/BERT-Relation-Extraction/src/tasks/trainer.py", line 33, in train_and_fit
    train_loader, test_loader, train_len, test_len = load_dataloaders(args)
  File "<whatever>/BERT-Relation-Extraction/src/tasks/preprocessing_funcs.py", line 336, in load_dataloaders
    df_train, df_test, rm = preprocess_semeval2010_8(args)
  File "<whatever>/BERT-Relation-Extraction/src/tasks/preprocessing_funcs.py", line 67, in preprocess_semeval2010_8
    sents, relations, comments, blanks = process_text(text, 'test')
  File "<whatever>/BERT-Relation-Extraction/src/tasks/preprocessing_funcs.py", line 39, in process_text
    assert re.match("^Comment", comment)
AssertionError

What am I missing here?

plkmo commented 3 years ago

I presume you have changed something in the semeval dataset? The assertion error will be raised if the dataset format doesnt exactly match the one given in the original link

sayanb-7c6 commented 3 years ago

Actually I have not changed anything. It's the vanilla dataset downloaded from here: https://github.com/sahitya0000/Relation-Classification/blob/master/corpus/SemEval2010_task8_all_data.zip

Are the arguments correct that I'm providing for the train and test data?

plkmo commented 3 years ago

I see the error now. You are using the wrong test data folder. Please see the default arguments provided in main_task.py.

sayanb-7c6 commented 3 years ago

Thank you, that resolved it and sorry, for not seeing that before.