sisl / ngsim_env

Learning human driver models from NGSIM data with imitation learning.
https://arxiv.org/abs/1803.01044
MIT License
172 stars 80 forks source link

validate.py causes segmentation fault #27

Open raks097 opened 4 years ago

raks097 commented 4 years ago

@raunakbh92 @wulfebw Hi, A wonderful paper and thank your for providing the implementations. I was able to train the GAIL agent but when I am running the validate.py I am running into a segmentation fault.

Currently running with Julia V1.1.0 and Ubuntu 18.04

""" Traceback (most recent call last): File "/home/asyin/anaconda3/envs/rllab3/lib/python3.5/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "validate.py", line 135, in collect_trajectories env_kwargs=dict(egoid=egoid, start=starts[egoid]) File "validate.py", line 31, in simulate a, a_info = policy.get_action(x) File "/home/asyin/R/rllab/hgail/hgail/policies/gaussian_latent_var_gru_policy.py", line 193, in get_action return actions[0], {k: v[0] for k, v in agent_infos.items()} File "/home/asyin/R/rllab/hgail/hgail/policies/gaussian_latent_var_gru_policy.py", line 193, in return actions[0], {k: v[0] for k, v in agent_infos.items()} KeyError: 0 """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "validate.py", line 381, in random_seed=run_args.random_seed File "validate.py", line 266, in collect random_seed=random_seed File "validate.py", line 188, in parallel_collect_trajectories [res.get() for res in results] File "validate.py", line 188, in [res.get() for res in results] File "/home/asyin/anaconda3/envs/rllab3/lib/python3.5/multiprocessing/pool.py", line 608, in get raise self._value KeyError: 0

signal (15): Terminated in expression starting at no file:0 read at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)

signal (15): Terminated in expression starting at no file:0

signal (15): Terminated in expression starting at no file:0

signal (15): Terminated in expression starting at no file:0 _Py_read at /home/ilan/minonda/conda-bld/work/Python-3.5.2/Python/fileutils.c:1205 _PyObject_GenericGetAttrWithDict at /home/ilan/minonda/conda-bld/work/Python-3.5.2/Objects/object.c:1053

signal (15): Terminated in expression starting at no file:0

signal (11): Segmentation fault in expression starting at no file:0

raks097 commented 4 years ago

@raunakbh92 Not sure what the reason for the segfault might be. Any changes that need to be made in the gaussian_latent_var_gru_policy.py ?

UPDATE:

File "/home/asyin/R/rllab/hgail/hgail/policies/gaussian_latent_var_gru_policy.py", line 193, in get_action return actions[0], {k: v[0] for k, v in agent_infos.items()} KeyError: 0

AGENT_INFO
[('mean', array([[ 0.01187703, -0.02830665]], dtype=float32)), ('prev_action', array([[ 0., 0.]])), ('latent_info', {'latent': array([[1, 0, 0, 0]])}), ('log_std', array([[-0.67997968, -0.72506523]], dtype=float32)), ('latent', array([[1, 0, 0, 0]]))]

I fixed the inital KeyError:0 by changing the way the dictionary was created by removing the latent_info key.

However, I still am getting this error,

signal (11): Segmentation fault in expression starting at no file:0

Similar to the one mentioned in https://github.com/sisl/ngsim_env/blob/c34f2c4bd6bf2e089b69bddefb4283ef6829c042/docs/usingTrainedPolicy.md

But deleting PyCall cache from ~/.julia/complied/v1.1 didnt work. Should try reverting to 0.6 (if so how?) or are there any other solutions ?

Thanks

DarrenRuan commented 4 years ago

@raks097 Hi man, have you solved this issue? So currently, you are only able to run training a policy?

DarrenRuan commented 4 years ago

agent_infos: {'mean': array([[-0.36561757, -0.4166246 ]], dtype=float32), 'log_std': array([[0.01288844, 0.06403346]], dtype=float32), 'prev_action': array([[0., 0.]]), 'latent': array([[0, 1, 0, 0]]), 'latent_info': {'latent': array([[0, 1, 0, 0]])}}

change 'latent_info': {'latent': array([[0, 1, 0, 0]])} ---> 'latent_info': array([[0, 1, 0, 0]])

DarrenRuan commented 4 years ago

python validate.py --n_proc 4 --exp_dir ../../data/experiments/NGSIM-gail/ --params_filename itr_1000.npz --random_seed 42

For example, if I have selected n_proc equal to 4 here, I met some errors that only some of the pid worked, one of them (say 2) might fail. pid: 0 or 1 or 3 traj: 515 / 516 pid equal to 2 never showed again.