The unreasonable low accuracy of composite types on GrailQA.

ltl3A87 / KB-BINDER

Few-shot In-context Learning for Knowledge Base Question Answering

MIT License

59 stars 8 forks source link

The unreasonable low accuracy of composite types on GrailQA. #9

Open Maydaytyh opened 6 months ago

Maydaytyh commented 6 months ago

Recently, when reproducing the performance of KB-BINDER on GrailQA, I used gpt_3.5_turbo_0613_16k with a temperature parameter set to 0.7. The EM results were 38.4 for composite types, 71.2 for i.i.d. types, and 46 for zero-shot types. The particularly low performance on composite types raises questions. I'm wondering if anyone has any thoughts on this? Perhaps this is normal when reproducing the results.

cdhx commented 5 months ago

Maybe it is because the original paper uses code-davinci which is not available now. Using gpt-3.5 to reproduce the results will lead to different result.