Open lisaalaz opened 1 month ago
Hi @lisaalaz, Thanks for trying MathSensei. Can you provide the configuration (dataset, model, and task_name) that you are trying to run?
Hi, the setting is --dataset 'GSM' --model 'pg_walpha_sg'. It seems that --task_name can be anything and it only matters in the naming of the output folder? (I am not trying to read from the outputs already provided because I have replaced the llama model repo I am using with a newer llama version). My setup does not throw any error, but it is not loading the data, presumably because solver does not have have a self.cache attribute? I assume a cache flag needs to be added to the config, what should it be set to? Also could you confirm if you can run this successfully with GSM8K as the dataset. Thanks!
@lisaalaz I have made some changes to model.py, run.py and utilities.py files. Can you pull the changes and check whether you are able to run the code or not? I am able to load both MATH and GSM8K. Which llama version are you trying?
@lisaalaz
On executing :
python run.py --dataset 'GSM' --data_root 'data/GSM-8K' --task_name gsm --model 'sg' --label 'sg_results'
I am getting the below log
====Input Arguments====
{
"data_root": "data/GSM-8K",
"data_file": "no",
"dataset": "GSM",
"output_root": "output_MATHSENSEI",
"model": "sg",
"label": "sg_results",
"task_name": "gsm",
"test_split": "test",
"test_number": null,
"seed": 0,
"python_model": "no",
"extra_python_libraries": "no",
"knowledge_model": "no",
"bing_model": "no",
"sg_model": "no",
"wolfram_model": "no",
"modules": null,
"policy_engine": "gpt-3.5-turbo",
"policy_temperature": 0,
"policy_max_tokens": 128,
"pg_engine": "gpt-3.5-turbo",
"pg_temperature": 0.5,
"pg_max_tokens": 256,
"kr_engine": "gpt-3.5-turbo",
"kr_temperature": 0.5,
"kr_max_tokens": 512,
"qg_engine": "gpt-3.5-turbo",
"qg_temperature": 0.0,
"qg_max_tokens": 64,
"qg_patience": 5,
"sg_engine": "gpt-3.5-turbo",
"sg_temperature": 0.5,
"sg_max_tokens": 600,
"sg_patience": 4,
"current_index": 0,
"refine": "no",
"error_mode": "no",
"bing_count": 5,
"debug": false
}
09/29/2024 03:47:32 PM Dataset: GSM
09/29/2024 03:47:32 PM First Sample: {'question': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?", 'answer': 'Janet sells 16 - 3 - 4 = <<16-3-4=9>>9 duck eggs a day.\nShe makes 9 * 2 = $<<9*2=18>>18 every day at the farmer’s market.\n#### 18'}
09/29/2024 03:47:32 PM <class 'list'>
# Number of test examples: 1319
Result file : output_MATHSENSEI/gsm/sg_results_test.json
0%| | 0/1319 [00:00<?, ?it/s]{'question': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?", 'answer': 'Janet sells 16 - 3 - 4 = <<16-3-4=9>>9 duck eggs a day.\nShe makes 9 * 2 = $<<9*2=18>>18 every day at the farmer’s market.\n#### 18'}
.
.
.
@lisaalaz Can you share your ====Input Arguments==== dictionary?
Hi, I am unable to run your code. after setting task and dataset in the flags it seems everything should be in order but it just pends indefinitely after showing the tqdm bar of 5000 data samples without making any progress. I see it is supposed to take the samples from self.cache but I do not see this being defined anywhere? Many thanks.