I found a bug in the current evaluation script, which doubles the duplicates in the dataset.
The respective line is located here and looks like:
result_df = generations_df.merge(similarities_df, on='id').merge(likelihoods_df, on='id')
The bug happens because merge creates additional rows whenever id has duplicates in both dataframes.
Since only generations_df has duplicates but not similarities_df, the first merge is still fine and does not introduce new rows.
The bug happens in the second merge, since likelihoods_df also has duplicates.
As a consequence, all the evaluation metrics are more heavily influenced by already existing duplicates.
I am not sure how large the impact on the results in your paper is, since I did not manage to reproduce your original results in the first place.
If you can provide me with the exact numbers of the figures in your paper, I can better compare our evaluations.
A potential fix would be to replace the line above with
result_df = pd.concat([generations_df, likelihoods_df.drop('id', axis=1)], axis=1).merge(similarities_df, on='id')
This works since the rows in generations_df and likelihood_df have the same order, so we do not need a matching logic (confirm via (generations_df['id'] == likelihoods_df['id']).all()).
Hi,
I found a bug in the current evaluation script, which doubles the duplicates in the dataset. The respective line is located here and looks like:
result_df = generations_df.merge(similarities_df, on='id').merge(likelihoods_df, on='id')
The bug happens because
merge
creates additional rows wheneverid
has duplicates in both dataframes. Since onlygenerations_df
has duplicates but notsimilarities_df
, the first merge is still fine and does not introduce new rows. The bug happens in the second merge, sincelikelihoods_df
also has duplicates. As a consequence, all the evaluation metrics are more heavily influenced by already existing duplicates. I am not sure how large the impact on the results in your paper is, since I did not manage to reproduce your original results in the first place. If you can provide me with the exact numbers of the figures in your paper, I can better compare our evaluations.A potential fix would be to replace the line above with
result_df = pd.concat([generations_df, likelihoods_df.drop('id', axis=1)], axis=1).merge(similarities_df, on='id')
This works since the rows in
generations_df
andlikelihood_df
have the same order, so we do not need a matching logic (confirm via(generations_df['id'] == likelihoods_df['id']).all()
).Greetings, Sebastian