rll / rllab

rllab is a framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym.
Other
2.91k stars 799 forks source link

Run "scripts/submit_gym.py" fails #11

Closed Alex-zhai closed 8 years ago

Alex-zhai commented 8 years ago

When i ran example/trpo_gym.py, i got the results. Then i ran "scripts/submit_gym.py", but i got following error: raise error.Error("[%s] You didn't have any recorded training data in {}. Once you've used 'env.monitor.start(training_dir)' to start recording, you need to actually run some rollouts. Please join the community chat on https://gym.openai.com if you have any issues.".format(env_id, training_dir)) gym.error.Error: [%s] You didn't have any recorded training data in Pendulum-v0. Once you've used 'env.monitor.start(training_dir)' to start recording, you need to actually run some rollouts. Please join the community chat on https://gym.openai.com if you have any issues. So why?

dementrock commented 8 years ago

Hi @Alex-zhai, can you paste the full command you ran? You should pass the gym_log directory to the submit_gym.py command. For example this is the full command on my computer:

python scripts/submit_gym.py /Users/dementrock/research/rllab/data/local/experiment/experiment_2016_05_31_10_50_06_0001/gym_log
Alex-zhai commented 8 years ago

Yes, I passed the gym_logdirectory to the submit_gym.py command, and the problem is still exits.

python scripts/submit_gym.py /home/alex-zhai/Downloads/rllab-master1/data/local/experiment/experiment_2016_05_31_23_34_50_0001/gym_log

dementrock commented 8 years ago

What files do you have under the gym_log folder?

Alex-zhai commented 8 years ago

The results produced by running example/trpo_gym.py .

dementrock commented 8 years ago

Can you paste the results of running ls?

Alex-zhai commented 8 years ago

openaigym.episode_batch.None.10971.10971.stats.json openaigym.manifest.None.10971.manifest.json openaigym.video.None.10971.video000000.meta.json openaigym.video.None.10971.video000000.mp4 openaigym.video.None.10971.video000001.meta.json openaigym.video.None.10971.video000001.mp4 openaigym.video.None.10971.video000008.meta.json openaigym.video.None.10971.video000008.mp4 openaigym.video.None.10971.video000027.meta.json openaigym.video.None.10971.video000027.mp4 openaigym.video.None.10971.video000064.meta.json openaigym.video.None.10971.video000064.mp4 openaigym.video.None.10971.video000125.meta.json openaigym.video.None.10971.video000125.mp4 openaigym.video.None.10971.video000216.meta.json openaigym.video.None.10971.video000216.mp4 openaigym.video.None.10971.video000343.meta.json openaigym.video.None.10971.video000343.mp4 openaigym.video.None.10971.video000512.meta.json openaigym.video.None.10971.video000512.mp4 openaigym.video.None.10971.video000729.meta.json openaigym.video.None.10971.video000729.mp4

dementrock commented 8 years ago

I'm sorry but I couldn't reproduce the issue locally. Can you check which commit you are using for the rllab repo, and check the version for openai gym? You can get that by running python -c "import gym; print gym.version.VERSION".

dementrock commented 8 years ago

Also, what is the operating system you are using?

Alex-zhai commented 8 years ago

The gym version is 0.1.1 in the anaconda2. But in my own python, the version is 0.1.2. And the operating system is 14.04 ubantu. So how to update the gym in anaconda2?

dementrock commented 8 years ago

You can run pip install --upgrade git+https://github.com/openai/gym.git, but I don't think the gym version is the issue. What is the content of the file openaigym.manifest.None.10971.manifest.json?

Alex-zhai commented 8 years ago

{"env_info": {"env_id": "Pendulum-v0", "gym_version": "0.1.1"}, "stats": "openaigym.episode_batch.None.10971.10971.stats.json", "videos": []}

dementrock commented 8 years ago

Ok I'm out of clues. Can you zip the content of the gym_log folder and send me an email? My email address is dementrock@gmail.com.

Alex-zhai commented 8 years ago

Ok , my email is 20144227023@stu.suda.edu.cn. Thank you!!!

dementrock commented 8 years ago

Hi @Alex-zhai, I inspected the content you sent. It seems like the openaigym.episode_batch.None.10971.10971.stats.json file failed to record any episode data, which caused the error you saw. Another thing that seems abnormal is the monitor id, which should be an integer following openaigym.episode_batch but right now it is None. Unfortunately I could not reproduce such behavior on a Linux machine.

Did you notice anything wrong when running trpo_gym.py? Can you set n_itr to 1 (only run for 1 iteration), and paste the full log here?

Alex-zhai commented 8 years ago

(rllab)alex-zhai@Alex-zhai:~/Downloads/rllab-master1$ python examples/trpo_gym.py Using gpu device 0: GeForce GTX 970 (CNMeM is disabled, CuDNN 4007) /home/alex-zhai/anaconda2/envs/rllab/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module. "downsample module has been moved to the theano.tensor.signal.pool module.") python /home/alex-zhai/Downloads/rllab-master1/scripts/run_experiment_lite.py --n_parallel '1' --seed '1' --log_dir '/home/alex-zhai/Downloads/rllab-master1/data/local/experiment/experiment_2016_06_02_08_57_12_0001' --snapshot_mode 'last' --exp_name 'experiment_2016_06_02_08_57_12_0001' --args_data 'Y2NvcHlfcmVnCl9yZWNvbnN0cnVjdG9yCnAxCihjcmxsYWIubWlzYy5pbnN0cnVtZW50ClN0dWJNZXRob2RDYWxsCnAyCmNfX2J1aWx0aW5fXwpvYmplY3QKcDMKTnRScDQKKGRwNQpTJ19fYXJncycKcDYKKGcxCihjcmxsYWIubWlzYy5pbnN0cnVtZW50ClN0dWJPYmplY3QKcDcKZzMKTnRScDgKKGRwOQpTJ2FyZ3MnCnAxMAoodHNTJ3Byb3h5X2NsYXNzJwpwMTEKY3JsbGFiLmFsZ29zLnRycG8KVFJQTwpwMTIKc1Mna3dhcmdzJwpwMTMKKGRwMTQKUydiYXNlbGluZScKcDE1CmcxCihnNwpnMwpOdFJwMTYKKGRwMTcKZzEwCih0c2cxMQpjcmxsYWIuYmFzZWxpbmVzLmxpbmVhcl9mZWF0dXJlX2Jhc2VsaW5lCkxpbmVhckZlYXR1cmVCYXNlbGluZQpwMTgKc2cxMwooZHAxOQpTJ2Vudl9zcGVjJwpwMjAKZzEKKGNybGxhYi5taXNjLmluc3RydW1lbnQKU3R1YkF0dHIKcDIxCmczCk50UnAyMgooZHAyMwpTJ19vYmonCnAyNApnMQooZzcKZzMKTnRScDI1CihkcDI2CmcxMAoodHNnMTEKY3JsbGFiLmVudnMubm9ybWFsaXplZF9lbnYKTm9ybWFsaXplZEVudgpwMjcKc2cxMwooZHAyOApTJ2VudicKcDI5CmcxCihnNwpnMwpOdFJwMzAKKGRwMzEKZzEwCih0c2cxMQpjcmxsYWIuZW52cy5neW1fZW52Ckd5bUVudgpwMzIKc2cxMwooZHAzMwpTJ2Vudl9uYW1lJwpwMzQKUydQZW5kdWx1bS12MCcKcDM1CnNzYnNzYnNTJ19hdHRyX25hbWUnCnAzNgpTJ3NwZWMnCnAzNwpzYnNzYnNTJ2JhdGNoX3NpemUnCnAzOApJNDAwMApzUydkaXNjb3VudCcKcDM5CkYwLjk4OTk5OTk5OTk5OTk5OTk5CnNTJ3N0ZXBfc2l6ZScKcDQwCkYwLjAxCnNTJ25faXRyJwpwNDEKSTEKc2cyOQpnMjUKc1MncG9saWN5JwpwNDIKZzEKKGc3CmczCk50UnA0MwooZHA0NApnMTAKKHRzZzExCmNybGxhYi5wb2xpY2llcy5nYXVzc2lhbl9tbHBfcG9saWN5CkdhdXNzaWFuTUxQUG9saWN5CnA0NQpzZzEzCihkcDQ2CmcyMApnMQooZzIxCmczCk50UnA0NwooZHA0OApnMjQKZzI1CnNnMzYKZzM3CnNic1MnaGlkZGVuX3NpemVzJwpwNDkKKEk4Ckk4CnRwNTAKc3Nic1MnbWF4X3BhdGhfbGVuZ3RoJwpwNTEKZzEKKGcyMQpnMwpOdFJwNTIKKGRwNTMKZzI0CmcyNQpzZzM2ClMnaG9yaXpvbicKcDU0CnNic3NiUyd0cmFpbicKcDU1Cih0KGRwNTYKdHA1NwpzUydfX2t3YXJncycKcDU4CihkcDU5CnNiLg==' /home/alex-zhai/anaconda2/envs/rllab/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module. "downsample module has been moved to the theano.tensor.signal.pool module.") using seed 1 using seed 1 [2016-06-02 08:57:13,261] Making new env: Pendulum-v0 2016-06-02 08:57:13.968611 CST | [experiment_2016_06_02_08_57_12_0001] Populating workers... [2016-06-02 08:57:13,980] Making new env: Pendulum-v0 2016-06-02 08:57:14.099682 CST | [experiment_2016_06_02_08_57_12_0001] Populated 0% 100% [ ][2016-06-02 08:57:14,296] Starting new video recorder writing to /home/alex-zhai/Downloads/rllab-master1/data/local/experiment/experiment_2016_06_02_08_57_12_0001/gym_log/openaigym.video.1.2808.video000000.mp4 avconv version 9.18-6:9.18-0ubuntu0.14.04.1, Copyright (c) 2000-2014 the Libav developers built on Mar 16 2015 13:19:10 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1) [# ] | ETA: 00:04:37[2016-06-02 08:57:28,966] Starting new video recorder writing to /home/alex-zhai/Downloads/rllab-master1/data/local/experiment/experiment_2016_06_02_08_57_12_0001/gym_log/openaigym.video.1.2808.video000001.mp4 avconv version 9.18-6:9.18-0ubuntu0.14.04.1, Copyright (c) 2000-2014 the Libav developers built on Mar 16 2015 13:19:10 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1) [############ ] | ETA: 00:00:42[2016-06-02 08:57:42,714] Starting new video recorder writing to /home/alex-zhai/Downloads/rllab-master1/data/local/experiment/experiment_2016_06_02_08_57_12_0001/gym_log/openaigym.video.1.2808.video000008.mp4 avconv version 9.18-6:9.18-0ubuntu0.14.04.1, Copyright (c) 2000-2014 the Libav developers built on Mar 16 2015 13:19:10 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1) [##############################] | ETA: 00:00:00 Total time elapsed: 00:00:42 2016-06-02 08:57:56.653851 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | fitting baseline... 2016-06-02 08:57:56.719935 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | fitted =: Compiling function f_loss done in 1.068 seconds /home/alex-zhai/Downloads/rllab-master1/rllab/optimizers/conjugate_gradient_optimizer.py:146: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future n_samples, n_samples * self._subsample_factor, replace=False) 2016-06-02 08:57:57.794056 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | computing loss before 2016-06-02 08:57:57.798731 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | performing update 2016-06-02 08:57:57.798881 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | computing descent direction =: Compiling function f_grad done in 1.266 seconds =: Compiling function f_Hx_plain done in 3.820 seconds 2016-06-02 08:58:02.926421 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | descent direction computed =: Compiling function f_loss_constraint done in 0.297 seconds 2016-06-02 08:58:03.230280 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | backtrack iters: 2 2016-06-02 08:58:03.230412 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | computing loss after 2016-06-02 08:58:03.230478 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | optimization finished =: Compiling function constraint done in 0.174 seconds 2016-06-02 08:58:03.409038 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | saving snapshot... 2016-06-02 08:58:03.410641 CST | [experiment_2016_06_02_08_57_12_0001] itr #0 | saved 2016-06-02 08:58:03.411401 CST | ----------------------- --------------- 2016-06-02 08:58:03.411506 CST | Iteration 0 2016-06-02 08:58:03.411598 CST | AverageDiscountedReturn -736.944 2016-06-02 08:58:03.411682 CST | AverageReturn -1763.87 2016-06-02 08:58:03.411740 CST | ExplainedVariance 2.16271e-13 2016-06-02 08:58:03.411821 CST | NumTrajs 20 2016-06-02 08:58:03.411900 CST | Entropy 1.41894 2016-06-02 08:58:03.411956 CST | Perplexity 4.13273 2016-06-02 08:58:03.412011 CST | StdReturn 67.0666 2016-06-02 08:58:03.412067 CST | MaxReturn -1613.78 2016-06-02 08:58:03.412123 CST | MinReturn -1855.37 2016-06-02 08:58:03.412179 CST | AveragePolicyStd 1 2016-06-02 08:58:03.412233 CST | LossBefore -6.67484e-10 2016-06-02 08:58:03.412289 CST | LossAfter -0.00232857 2016-06-02 08:58:03.412344 CST | MeanKL 0.00684979 2016-06-02 08:58:03.412398 CST | dLoss 0.00232857 2016-06-02 08:58:03.412467 CST | ----------------------- ---------------

***************************

Training finished! You can upload results to OpenAI Gym by running the following command:

python scripts/submit_gym.py /home/alex-zhai/Downloads/rllab-master1/data/local/experiment/experiment_2016_06_02_08_57_12_0001/gym_log

***************************
Alex-zhai commented 8 years ago

Yes, after I set n_iter equals 1. I submited the results successfully. So strange!!

dementrock commented 8 years ago

What if you now try a larger n_itr?

Alex-zhai commented 8 years ago

Yes, i tried n_itr=10 ,n_itr = 50, everything is OK now! Thank you very much!!!

dementrock commented 8 years ago

Ok awesome! Might be just a very rare error then. Please let us know if the error occurs again.