Output error of Example: Agent-Environment Evaluator for SciTech Challenge

henrybb0826 commented 2 weeks ago

Hello, I am running Example: Agent-Environment Evaluator for SciTech Challenge. Currently I have finished running python evaluate.py configs/example_eval_cfg.yaml. But the screen will be stuck in ~~~Closing KSPDG environment~~~ and the txt file will not be output. Wondering if there is something wrong with the settings? closing env

rallen10 commented 2 weeks ago

@henrybb0826 Thank you for creating this issue. Can you start by telling me what operating system you are using (Windows, Mac, Linux) and what version of python (3.9, 3.12)?

rallen10 commented 2 weeks ago

@henrybb0826 Two other questions

How long do you wait after that last message appears? Some of the threads/processes running for the evaluation have ~30sec timeouts that you may need to wait for at the end
Also, there may have been some thread that lost communication and timed-out before the episode timeout, but that wouldn't have necessarily caused an error. Such a timeout might be buried further back in the terminal output. Would you be able to run the evaluation again, saving the entire terminal output to a text file, and then sharing that output? You can achieve this with the tee command; such as:
```
python evaluate.py configs/example_eval_cfg.yaml 2>&1 | tee ~/Desktop/output.txt
```

rallen10 commented 1 week ago

I have a suspicion that this is caused by some underlying problem with the julia dependencies used in the LG3 environments. Here are a few debugging steps that may help (or may help me help you)

Did you successfully run the install_julia_deps.sh script without any errors?
Can you run the “serverless tests” without error? In particular, does test_jl_solvers.py execute without error or failure?
For better insight, can you run the [scripts/example_private_src_env_runner.py](https://github.com/mit-ll/spacegym-kspdg?#example-agent-environment-runner) example, logging the output to a txt file which has more verbose debugging statements, and share the output txt with me?

python scripts/example_private_src_env_runner.py 2>&1 | tee ~/Desktop/output.txt

strsix commented 2 days ago

We've faced a similar problem where the episode never ends during the evaluation, plus julia dependency doesn't install, and serverless tests cause segfault error.

Here are some workarounds that worked for us, hopefully it helps others who are facing such errors.

conda remove --name kspdg --all
Set the "environment.yml" to default (if you added or changed anything there). Then, edit python version from 3 to 3.9 in "environment.yml"
pip install poliastro (if you are using it within your algorithm)
python install_julia_deps.py
pytest tests/serverless_tests/

So the step 1 and 2 made things work.

rallen10 commented 1 day ago

@strsix: I don't have a good explanation for why you were getting the sefault and why your workaround fixed it, but my best guess is that it has something to do with the installation order of dependencies. That is to say that additional dependencies for agent development (e.g. poliastro in your case) needed to be install after kspdg's own dependencies (found in pyproject.toml and install_julia_deps.py).

Also, it does not seem to be strictly necessary, but I strongly recommend using juliaup to install julia on your machine before installing kspdg's environment. This helps manage different julia version similar to how conda helps manage different python versions on one computer.

Therefore, the installation process for developing new agents to solve kspdg's challenge problems might look roughly like

# install juliaup on MacOS, for further instructions: https://github.com/JuliaLang/juliaup#installation
curl -fsSL https://install.julialang.org | sh

# create the kspdg conda environment and then clone it to repurpose it for agent development
conda env create -f environment.yml   # this creates the kspdg environment
conda create --name kspdg_agents --clone kspdg   # this creates a new env called kspdg_agents which starts as a copy of kspdg env
conda remove --name kspdg --all    # optionally delete the original kspdg env if you don't plan to use it

# install additional julia dependencies for the "adv_bots" kspdg environments (e.g. LBG1_LG3)
conda activate kspdg_agents
python install_julia_deps.py

# NOW install any additional dependencies you want to use for your kspdg agents within the kspdg_agent conda env
pip install <other-dependencies-you-want>

I haven't fully tested this process, but I will update the README install instructions once I have a chance to vet it.

mit-ll / spacegym-kspdg

Output error of Example: Agent-Environment Evaluator for SciTech Challenge #18