ulab-uiuc / research-town

A platform for developers to simulate research community
http://docs.auto-research.dev
Apache License 2.0
87 stars 8 forks source link

[EXP]: Review Evaluation #201

Closed Monstertail closed 4 months ago

Monstertail commented 5 months ago

Description

No response

Additional Information

No response

Monstertail commented 5 months ago
  1. score: The prompt is here. Output: see here. Please only keep those end with p20_r3 which means there are 20 papers in each domain with 3 reviewers for each. Others are the results for 50/10 papers, please ignore.
  2. content. Mainly results are evaluated by llama3-70b. Results see here.

This branch also has the readme to run the scripts for the above evaluation. @lwaekfjlk @ft2023 @chengzr01

Monstertail commented 5 months ago
Domain Review Score Review Content(llama3-70b) Review Content(GPT-4)
Spearman Kendall Overall Dimension Overall Dimension
CV -0.009 0.032 94.4 [9.4,9.4,9.0,9.4,9.4,9.4,9.4,9.4,9.4,9.45] 86.5 [8.8,8.5,7.4,8.65,8.55,8.45,7.65,8.55,8.4,8.6]
NLP -0.042 0.084 94.25 [9.35,9.4,9.0,9.35,9.35,9.3,9.3,9.35,9.35,9.55] 85.45 [ 8.6,8.5,7.25,8.55,8.55,8.1,7.45,8.4,8.4,8.4]
RL 0.003 0.042 94.3 [9.3,9.25,9.0,9.25,9.25,9.25,9.25,9.25,9.25,9.7] 86.45 [8.45,8.65,7.55,8.6,8.35,8.05,7.4,8.6,8.3,8.45]
FL 0.616 0.484 94.2 [9.2,9.2,9.0,9.2,9.2,9.2,9.2,9.2,9.2,9.7] 85.75 [ 8.5,8.35,7.35,8.45,8.3,8.15,7.35,8.4,8.2,8.25]
GNN 0.440 0.336 94.25 [9.25,9.25,9.0,9.25,9.25,9.25,9.25,9.25,9.25,9.8] 87.7 [8.9,8.9,8.1,8.7,9.0,8.6,8.6,8.5,9.1,8.5]