Open jkrude opened 1 week ago
Thanks for reporting I can't reproduce this error on the main branch with tensordict and pytorch nightly, the script runs perfectly fine. Does this occur sporadically?
No you're right the nightlies are broken. I will fix that In the meantime you can install it all like this:
# Adapt this if you need cuda, e.g. nightly/cu124
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu -U
pip3 install git+https://github.com/pytorch/tensordict -U
pip3 install git+https://github.com/pytorch/rl -U
LMK if you can reprod after that!
The nightly release should be available now (not for windows though) https://pypi.org/project/torchrl-nightly/#history
Thanks for the quick fix regarding the nightly builds 👍 I encounter the same error with 2024.7.3 versions using the above scripts. Here is my full pip list | Package | Version |
---|---|---|
cloudpickle | 3.0.0 | |
Farama-Notifications | 0.0.4 | |
filelock | 3.13.1 | |
fsspec | 2024.6.1 | |
gymnasium | 0.29.1 | |
Jinja2 | 3.1.4 | |
MarkupSafe | 2.1.5 | |
mpmath | 1.3.0 | |
networkx | 3.3 | |
numpy | 2.0.0 | |
orjson | 3.10.6 | |
packaging | 24.1 | |
pip | 23.2.1 | |
setuptools | 68.2.0 | |
sympy | 1.12.1 | |
tensordict-nightly | 2024.7.3 | |
torch | 2.5.0.dev20240703+cpu | |
torchrl-nightly | 2024.7.3 | |
typing_extensions | 4.12.2 | |
wheel | 0.41.2 |
I am a bit surprised that it works on your side, as the primary code-snippets are still the same on the main-branch.
Here in a2c.py
the value estimator is called with both params
and target_params
, where the params
are not the same as target_params
as they are detached?
self.value_estimator(
tensordict,
params=self._cached_detach_critic_network_params,
target_params=self.target_critic_network_params,
)
Which ultimately fails in advantages.py
still the same on the main branch:
if next_params is not None and next_params is not params:
raise ValueError(
"the value at t and t+1 cannot be retrieved in a single call without recurring to vmap when both params and next params are passed."
)
Note that I am running completely on CPU without GPU support on the running machine, don't know if that makes any difference 🤷.
Describe the bug
Not quite sure if this is supported behavior, but if I set
functional=True
for the A2C loss andshifted=True
forTD0Estimator
I get an internal error.To Reproduce
Expected behavior
The losses are calculated correctly and the value_network is only called once in the computation of the advantage.
System info
Reason and Possible fixes
The problem seems to be in this snippet, where detached parameter are used for
params
which makes them unequal.