math-eval / aaai2024comp

AAAI2024 Global Competition on Math Problem Solving and Reasoning
10 stars 0 forks source link

About Evaluation #2

Open Ljyustc opened 1 year ago

Ljyustc commented 1 year ago

Hi, I have two questions:

  1. During the evaluation phase, is the ACC calculated automatically by codes or manually by annotators?
  2. If the model output is a reasoning process that contains much natural language, whether the result is correct or not, ACC will consider the response to be wrong?
math-eval commented 1 year ago

Hi Ljyustc, Here are the answers to your questions :) :

  1. the acc is calculated by codes automatically
  2. the final prediction JSON file should only contain the queID for each question as keys and pure number answer in string format as values, for details about submission, please check the submission rule in the evaluation page: [image: 截屏2023-10-30 20.21.22.png]

On Thu, Oct 26, 2023 at 2:09 PM Ljyustc @.***> wrote:

Hi, I have two questions:

  1. During the evaluation phase, is the ACC calculated automatically by codes or manually by annotators?
  2. If the model output is a reasoning process that contains much natural language, whether the result is correct or not, ACC will consider the response to be wrong?

— Reply to this email directly, view it on GitHub https://github.com/math-eval/aaai2024comp/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCPJLEALXLLTKEK772E4VHDYBH5BDAVCNFSM6AAAAAA6QQKVO2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DENZXGE2TSOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Ljyustc commented 1 year ago

Many thanks for your reply. Then, I have a question is: I have had submitted two results for Track 1 last week. However, I still do not get the responses and the status of the submission is "scoring". Could you please check if there exists any problem?

math-eval commented 1 year ago

Hi Ljyustc, Actually it's a common problem since the competition platform CodaBench is using a queue-type scheduling mechanism on the computation resource for running evaluations. From my own experience, I suggest that you'd better submit once again immediately whenever your first submission has been blocked(usually occurs during 'preparing' and 'scoring' phases), which probably accelerates one of your two consecutive submissions :)

On Mon, Oct 30, 2023 at 8:28 PM Ljyustc @.***> wrote:

Many thanks for your reply. Then, I have a question is: I have had submitted two results for Track 1 last week. However, I still do not get the responses and the status of the submission is "scoring". Could you please check if there exists any problem?

— Reply to this email directly, view it on GitHub https://github.com/math-eval/aaai2024comp/issues/2#issuecomment-1785082023, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCPJLEBETAOVJ6NZV6BFXMTYB6MQNAVCNFSM6AAAAAA6QQKVO2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBVGA4DEMBSGM . You are receiving this because you commented.Message ID: @.***>