snap-stanford / MLAgentBench

MIT License
246 stars 34 forks source link

No such file found when trying to run eval() #10

Closed KevinH48264 closed 8 months ago

KevinH48264 commented 1 year ago

I'm trying to evaluate the output of train.py created by my agent. The --log-dir flag I sent in the original command was "house-price-testing-v1-gpt4" which I assume should match the "--log-folder" flag.

I'm actually not sure what to expect from the eval command presented in the README. I assume that a score (% accuracy?) will be found in a results.json file (which will be in my root directory?).

But before I can even get a results.json file produced, I'm running into this error with the eval command. I was wondering if you'd be able to help provide more information about what to expect from the eval.py script and how to address this error?

error

KevinH48264 commented 1 year ago

One workaround was 1) changing the code for this because it seemed like the absolute path pointed to MLAgentBench//env_log/../log, but the root directory should be instead of MLAgentBench and 2) copying and pasting main_log into the root directory as "log" -- I don't see a log file being generated and therefore it's throwing an error, I only have a main_log file generated. I don't think this is the best workaround, so would be interested in hearing thoughts on what might be happening and if there's a better fix.

image

The question of where is 'answer.csv' still persists though and prevents an actual evaluation.

KevinH48264 commented 12 months ago

I've made a workaround to get eval() working.

It seems like the code was expecting answer.csv to be either in the root directory (which doesn't make sense because it's never copied there), or it could be reasonable to think that it'll be under the step folder. However, answer.csv from my testing is downloaded into /scripts folder instead of /env folder so it'll never be copied into the workspace and the steps.

My current workaround was to add an argument into the get_score() function for the answer file path pointing to _f"MLAgentBench/benchmarks/{benchmark_foldername}/scripts/answer.csv", but curious if this is an expected issue or if I'm missing something.

q-hwang commented 12 months ago

One workaround was 1) changing the code for this because it seemed like the absolute path pointed to MLAgentBench//env_log/../log, but the root directory should be instead of MLAgentBench and 2) copying and pasting main_log into the root directory as "log" -- I don't see a log file being generated and therefore it's throwing an error, I only have a main_log file generated. I don't think this is the best workaround, so would be interested in hearing thoughts on what might be happening and if there's a better fix.

image

The question of where is 'answer.csv' still persists though and prevents an actual evaluation.

We generate the MLAgentBench//env_log/../log file to catch all the output of the script as shown here: https://github.com/snap-stanford/MLAgentBench/blob/main/multi_run_experiment.sh#L44

q-hwang commented 12 months ago

I've made a workaround to get eval() working.

It seems like the code was expecting answer.csv to be either in the root directory (which doesn't make sense because it's never copied there), or it could be reasonable to think that it'll be under the step folder. However, answer.csv from my testing is downloaded into /scripts folder instead of /env folder so it'll never be copied into the workspace and the steps.

My current workaround was to add an argument into the get_score() function for the answer file path pointing to _f"MLAgentBench/benchmarks/{benchmark_foldername}/scripts/answer.csv", but curious if this is an expected issue or if I'm missing something.

Our script expects answer.csv to be generated in the same folder as ../scripts/eval.py and is supposed to work out of the box. Will check on this in my end.

KevinH48264 commented 12 months ago

This could be a Docker problem, but it seems unlikely. When I go to the eval.py for benchmarks/house-price/scripts, the "os.path.abspath("answer.csv")" prints to be MLAgentBench/answer.csv. This doesn't exist, and I assume the expected filepath was really MLAgentBench/benchmarks/house-price/scripts/answer.csv.

Maybe Docker sets the os.path to start from MLAgentBench, but I'm still not sure how the line "test_data = pd.read_csv("answer.csv")" in eval.py could possibly know the right path ("MLAgentBench/benchmarks/house-price/scripts/answer.csv") unless the user called the eval command from MLAgentBench/benchmarks/house-price/scripts in command line.

q-hwang commented 12 months ago

Yeah I think we were calling it directly from command. Could you push your workaround? Thanks!