Closed MatPoliquin closed 1 year ago
This seems a pure Pytorch.Categorical issue when the round off error raised an exception. https://discuss.pytorch.org/t/distributions-categorical-fails-with-constraint-simplex-but-manual-check-passes/163209/3 Actually in V100 or A100 GPUs I do not see any issue, so maybe slightly differences in GPUs.
When running the example here: https://github.com/salesforce/warp-drive/blob/master/tutorials/simple-end-to-end-example.ipynb
Note:
repo commit: b5d46d4 These tests passed successfully: python warp_drive/utils/unittests/run_unittests_pycuda.py python warp_drive/utils/unittests/run_trainer_tests.py NVIDIA p104-100 8GB
I get this output:
Device: 0 Iterations Completed : 1 / 50
Speed performance stats
Mean policy eval time per iter (ms) : 196.94 Mean action sample time per iter (ms) : 37.12 Mean env. step time per iter (ms) : 85.96 Mean training time per iter (ms) : 123.15 Mean total time per iter (ms) : 453.86 Mean steps per sec (policy eval) : 50775.87 Mean steps per sec (action sample) : 269373.56 Mean steps per sec (env. step) : 116335.91 Mean steps per sec (training time) : 81202.92 Mean steps per sec (total) : 22033.34
Metrics for policy 'runner'
VF loss coefficient : 0.01000 Entropy coefficient : 0.05000 Total loss : 0.09430 Policy loss : 0.33186 Value function loss : 0.20734 Mean rewards : 0.00085 Max. rewards : 1.00000 Min. rewards : -1.00000 Mean value function : 0.04290 Mean advantages : 0.06929 Mean (norm.) advantages : 0.06929 Mean (discounted) returns : 0.11219 Mean normalized returns : 0.11219 Mean entropy : 4.79267 Variance explained by the value function: 0.01151 Std. of action_0 over agents : 3.13083 Std. of action_0 over envs : 3.14615 Std. of action_0 over time : 3.14577 Std. of action_1 over agents : 3.17047 Std. of action_1 over envs : 3.18386 Std. of action_1 over time : 3.18446 Current timestep : 10000.00000 Gradient norm : 0.00000 Learning rate : 0.00500 Mean episodic reward : 1.71000 Mean episodic steps : 100.00000
Metrics for policy 'tagger'
VF loss coefficient : 0.01000 Entropy coefficient : 0.05000 Total loss : 1.78037 Policy loss : 2.01399 Value function loss : 0.59261 Mean rewards : 0.01810 Max. rewards : 1.00000 Min. rewards : 0.00000 Mean value function : 0.06817 Mean advantages : 0.42039 Mean (norm.) advantages : 0.42039 Mean (discounted) returns : 0.48856 Mean normalized returns : 0.48856 Mean entropy : 4.79084 Variance explained by the value function: -0.00882 Std. of action_0 over agents : 3.06860 Std. of action_0 over envs : 3.17762 Std. of action_0 over time : 3.17566 Std. of action_1 over agents : 3.05678 Std. of action_1 over envs : 3.16503 Std. of action_1 over time : 3.16620 Current timestep : 10000.00000 Gradient norm : 0.00000 Learning rate : 0.00200 Mean episodic reward : 9.05000 Mean episodic steps : 100.00000
[Device 0]: Saving the results to the file '/tmp/continuous_tag/example/1679065351/results.json' [Device 0]: Saving the 'runner' torch model to the file: '/tmp/continuous_tag/example/1679065351/runner_10000.state_dict'. [Device 0]: Saving the 'tagger' torch model to the file: '/tmp/continuous_tag/example/1679065351/tagger_10000.state_dict'. Traceback (most recent call last): File "wd_test.py", line 84, in
trainer.train()
File "/home/warp/github/warp-drive/warp_drive/training/trainer.py", line 415, in train
metrics = self._update_model_params(iteration)
File "/home/warp/github/warp-drive/warp_drive/training/trainer.py", line 710, in _update_model_params
perform_logging=logging_flag,
File "/home/warp/github/warp-drive/warp_drive/training/algorithms/policygradient/a2c.py", line 102, in compute_loss_and_metrics
m = Categorical(action_probabilities_batch[idx])
File "/home/warp/anaconda3/envs/warp_drive/lib/python3.7/site-packages/torch/distributions/categorical.py", line 64, in init
super(Categorical, self).init(batch_shape, validate_args=validate_args)
File "/home/warp/anaconda3/envs/warp_drive/lib/python3.7/site-packages/torch/distributions/distribution.py", line 56, in init
f"Expected parameter {param} "
ValueError: Expected parameter probs (Tensor of shape (100, 100, 5, 11)) of distribution Categorical(probs: torch.Size([100, 100, 5, 11])) to satisfy the constraint Simplex(), but found invalid values:
tensor([[[[ 1.2426e+00, -1.2945e+00, 4.1014e-01, ..., 5.5622e-01,
-6.7214e-01, -1.2349e+00],
[-1.7248e-01, 6.4287e-02, -7.4881e-01, ..., 4.6214e-01,
7.5912e-01, 1.8682e-01],
[ 6.3147e-01, 4.5790e-01, -3.2810e-01, ..., 3.1173e-01,
2.7938e-01, 3.7275e-01],
[ 1.9841e+00, 7.4553e-01, -6.1727e-01, ..., -8.2579e-01,
-1.8078e+00, -5.4283e-01],
[ 4.3695e-01, 1.6643e-02, -1.7423e-01, ..., 6.6712e-01,
-5.9217e-01, -7.6138e-01]],
Exception ignored in: <function PyCUDASampler.del at 0x7f66e6914830> Traceback (most recent call last): File "/home/warp/github/warp-drive/warp_drive/managers/pycuda_managers/pycuda_function_manager.py", line 510, in del File "/home/warp/anaconda3/envs/warp_drive/lib/python3.7/site-packages/pycuda/driver.py", line 480, in function_call pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource handle