stavkl / morphbert

BERT experiments for modelling Hebrew morphology
2 stars 1 forks source link

POS2 minor bugs #1

Closed shaked571 closed 4 years ago

shaked571 commented 4 years ago

Under the add-line "Add sorting on the dataframes" train_df['sent_len'] = train_df.groupby('sent_id').id.transform('size') s = train_df.sort_values(by=['sent_len', 'sent_id', 'id']).index is raising an exception, I changed the lines to :
train_df['sent_len'] = train_df.groupby('sent_id').sent_id.transform('size') s = train_df.sort_values(by=['sent_len', 'sent_id']).index

shaked571 commented 4 years ago

Not to open to many threads I will continue here.

I belive that there is a bug in the current version. When you calculate the F1 score you in one hand adding "^" in the beginning of every POS, but in the other hand you don't strip them in the evaluation.

In my machine I changed the functions calculate_accuracy and existence_accuracy to

def calculate_accuracy(df):
    numOfCorrectPredictions = 0
    for index in df.index:
        orig_pos = df.at[index, 'orig_label'].strip('^')
        pred_pos = df.at[index, 'predicted_tag'].strip('^')
        if orig_pos == pred_pos:
            numOfCorrectPredictions += 1
    return numOfCorrectPredictions/len(df)

and

def existence_accuracy(df):
    # correct tag = appeared in predicted and in gold
    total_orig_num_of_labels = 0
    total_predicted_num_of_labels = 0
    total_num_of_correct_tags = 0

    for index in df.index:
        orig_list = df.at[index, 'orig_label']
        **predicted_list = df.at[index, 'predicted_tag'].strip('^')**
        total_orig_num_of_labels += len(orig_list)
        total_predicted_num_of_labels += len(predicted_list)
        total_num_of_correct_tags += len(set(orig_list).intersection(set(predicted_list)))

    precision = total_num_of_correct_tags / total_predicted_num_of_labels * 100
    recall = total_num_of_correct_tags / total_orig_num_of_labels * 100
    f1 = 2*precision*recall/(precision+recall)

    print("Precision: {0:.2f}%".format(precision))
    print("Recall: {0:.2f}%".format(recall))
    print("F1: {0:.2f}%".format(f1))

I would like to verfiy with you it is valid. Before my F1 score was 0, and now its better but still far from good: DEV: Precision: 55.47% Recall: 71.82% F1: 62.59% TEST: Precision: 53.74% Recall: 72.04% F1: 61.56%

stavkl commented 4 years ago

The nbs were uploaded per Reut request - I'm not maintaining this project. if you think there's a bug the best way to prove it is to provide evidence (i.e. print the relevant dfs and see if the ^ is making problems - don't just inspect the code). to the best of my knowledge there is no such ^-related bug. Also the results should match the results in the paper

shaked571 commented 4 years ago

Hey Stav,

I know that you are not maintaining the project. I just had some exceptions I thought might interest you.

Regarding the '^'. Also in the notebook in your GIT you can see that the F1 is 0. I wouldn't write it to you unless I checked it before...I was just worried it might also affect the training phase.

In our last talk, as far as I recall and please correct me if I'm wrong, you told me that the POS2 notebook is the one which I should run.

First, I ran the POS2 (with multi-lingual bert base) as is, the results didn't align with the article results - Not sure why (Without my fixing -.strip('^')- the result was 0). I got: for Exact match accuracy:

DEV - Exact Match Accuracy = 46.17% TEST - Exact Match Accuracy = 45.18%

for existence accuracy:

DEV: Precision: 29.69% Recall: 71.44% F1: 41.95% TEST: Precision: 29.24% Recall: 71.56% F1: 41.52%

Afterwords I tried to run Amit's model which using his WP tokenizer and I got the following result:

for Exact match accuracy:

DEV - Exact Match Accuracy = 81.10% TEST - Exact Match Accuracy = 79.14%

for existence accuracy:

DEV: Precision: 55.47% Recall: 71.82% F1: 62.59% TEST: Precision: 53.74% Recall: 72.04% F1: 61.56%

I guess I am doing something wrong here, I would be happy to go over it together if you may to see what I am missing.

The differences are too significant...

Thanks,

Shaked

On Sun, 20 Sep 2020 at 20:08, stavkl notifications@github.com wrote:

The nbs were uploaded per Reut request - I'm not maintaining this project. if you think there's a bug the best way to prove it is to provide evidence (i.e. print the relevant dfs and see if the ^ is making problems - don't just inspect the code). to the best of my knowledge there is no such ^-related bug. Also the results should match the results in the paper

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stavkl/morphbert/issues/1#issuecomment-695811140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWMGBB5BUH3OSBU6OYAIXDSGYZKBANCNFSM4RTXN2UQ .