natalymr / gcm

This repo contains all scripts that are related to "Generate Commit Message" task
1 stars 0 forks source link

[baseline] naive bayes #7

Closed natalymr closed 5 years ago

natalymr commented 5 years ago

https://github.com/natalymr/gcm/blob/master/naive_bayes/naive_bayes.ipynb

all tokens without separators

train data test data classification acc bleu
intellij intellij 0.97 0.03794679
intellij aurora 0.91 0.01468671
aurora aurora 0.95 0.01288976
intellij + aurora aurora 0.95 0.01288976

only identifiers

train data test data classification acc bleu
intellij intellij 0.98 0.0381872
intellij aurora 0.94 0.0145752
aurora aurora 0.96 0.01311591
intellij + aurora aurora 0.96 0.01311591
natalymr commented 5 years ago

Количество данных в каждом из датасетов:

stacymiller commented 5 years ago
  1. "smth was changed" does not seem to be the best comment w.r.t. BLEU score. Maybe we should leave the message empty or construct a phrase that would be similar (in terms of BLEU) to as many commits as possible. Same idea applies to other labels.
  2. Naive Bayes was supporsed to be a simple baseline, but setting min_df and max_df values to non-default values is still a good idea.
  3. What are bleu score values in brackets?
natalymr commented 5 years ago
  1. да, поняла, переделаю
  2. я, если честно, не поняла, про какие именно значения идет речь
  3. если мы при подсчете bleu score-a учитываем только юниграммы