zhao-zilong / ssc-cot

Git for "Stepwise Self-Consistent Mathematical Reasoning with Large Language Models"
MIT License
10 stars 0 forks source link

Details of evaluation on MATH dataset? #4

Open ViperVille007 opened 3 months ago

ViperVille007 commented 3 months ago

For the results reported for SSC-CoT on MATH level 5 dataset, were the experiments conducted with the generalised version of SSC-CoT, without any knowledge graph? If a KG was used, how was it created, considering the vast topics covered by the dataset and being quite different from the trigonometry specialised KG.

In case I want to create a KG of my own for some other specific topic (say probability) or math in general, would we still need something like the 2 category segmentation, similar to the one employed ? (1. Trigonometric Functions and 2. Associated Angles)

Kindly elaborate in this regard

Thanks

zhao-zilong commented 3 months ago

Hi @ViperVille007 for the paper on arxiv, there is no use of KG for math level 5. But in our new version of paper, which is not shown in arxiv yet, we use it now. And KG for, say algebra, you do not necessarily have trigonometric function and associated angles, so you need to construct it differently.

ViperVille007 commented 3 months ago

Thank you @zhao-zilong for the reply I'm just curious to know if the addition of KG helped to improve the results on MATH even further, as was the case with the trimaster100 dataset!

Also, the final part of the SSC-CoT algorithm workflow states that: "To conclude the final result from SSC-CoT, we first find out which intermediate result contains the conclusion statement. The final result is then derived from majority vote among all these conclusion statements" How is this part done... are you using an LLM to infer if a statement is a concluding one or not and then extracting the answer from that statement? Is there any code that can help me extract the final generated answer for the trimaster100 questions? Right now, I'm actually a bit confused seeing the response CSV files 😅

Thanks again.