naver / posescript

Other
116 stars 9 forks source link

Have additional question about evaluation metric #23

Closed hunblingbling closed 1 month ago

hunblingbling commented 1 month ago

Hello! I am the person who inquired about the evaluation metrics before. Thank you very much for your kind response; it was very helpful.

I have one more question to ask. Although I can find Bertscore in the code, it is not mentioned in the paper. Should I pay attention to this metric, or is it not significant?

Please take your time to respond. Thank you!

g-delmas commented 1 month ago

Hello! Model-based metrics such as the Bertscore can assess semantic content, provided that the topic falls in the domain of the data used for training the auxiliary model. Unfortunately, we found that BERT lack the proper granularity to distinguish between two detailed pose descriptions or instructions. This is why we rely on specific in-domain retrieval models (trained to match instructions to pairs of poses; or descriptions to single poses) to assess generated texts (see the R precision metrics).