Closed starexplorer21 closed 1 year ago
Hi, apparently there was a breaking change in a recent version of pandas or torch (most likely pandas), can you provide your versions of pandas
and torch
so that I can reproduce the issue please?
Also do you still have this error if you delete the checkpoint?
I am currently experimenting with the command line interface for this library, but am constantly getting thrown a TypeError.
I have tried experimenting with tweaking the parameters, restarting openplanet, etc, but to no avail.
Here are the command line logs from the trainer, the other 2 scripts appear to be functioning properly.
I am running this in powershell with python 3.10, with no wandb connection
INFO:root:Namespace(server=False, trainer=True, worker=False, test=False, benchmark=False, record_reward=False, check_env=False, no_wandb=True, config={}) INFO:root:10/08/23 23:58:30 server IP: 127.0.0.1 INFO:root:--- NOW RUNNING SAC on TrackMania --- INFO:root:Loading checkpoint... INFO:root: Loaded checkpoint in 0.0033698081970214844 seconds. INFO:root:Updating checkpoint... INFO:root:Target entropy: -0.5. INFO:root:Max epochs changed to 100 (old: 10000). INFO:root:Rounds per epoch changed to 5 (old: 10). INFO:root:Checkpoint updated in 0.0 seconds. INFO:root:=== epoch 0/100 ==== round 0/5 ======================================= INFO:root: Waiting for new samples INFO:root: Resuming training INFO:root:starting training C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\tmrl\custom\utils\nn.py:44: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() assert b.storage().data_ptr() == a.storage().data_ptr() Traceback (most recent call last): File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\tmrl\__main__.py", line 82, in <module> main(arguments) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\tmrl\__main__.py", line 56, in main trainer.run() File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\tmrl\networking.py", line 393, in run run(interface=self.interface, File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\tmrl\networking.py", line 326, in run for stats in iterate_epochs_tm(run_cls, interface, checkpoint_path, dump_run_instance_fn, load_run_instance_fn, 1, updater_fn): File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\tmrl\networking.py", line 269, in iterate_epochs_tm yield run_instance.run_epoch(interface=interface) # yield stats data frame (this makes this function a generator) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\tmrl\training_offline.py", line 153, in run_epoch stats += pandas_dict(memory_len=len(self.memory), round_time=round_time, idle_time=idle_time, **DataFrame(stats_training).mean(skipna=True)), File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 11338, in mean result = super().mean(axis, skipna, numeric_only, **kwargs) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py", line 11978, in mean return self._stat_function( File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py", line 11935, in _stat_function return self._reduce( File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 11207, in _reduce res = df._mgr.reduce(blk_func) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\managers.py", line 1459, in reduce nbs = blk.reduce(func) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\blocks.py", line 377, in reduce result = func(self.values) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 11139, in blk_func return op(values, axis=axis, skipna=skipna, **kwds) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\nanops.py", line 147, in f result = alt(values, axis=axis, skipna=skipna, **kwds) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\nanops.py", line 404, in new_func result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\nanops.py", line 720, in nanmean the_sum = _ensure_numeric(the_sum) File "C:\Users\Yile0\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\nanops.py", line 1678, in _ensure_numeric raise TypeError(f"Could not convert {x} to numeric") TypeError: Could not convert [tensor(1851.6046) tensor(326.9721)] to numeric
I am receiving the exact same Typerror, and yes tmrl --server and --worker is working properly. Its just the trainer that is currently broken.
Hi, this is probably torch
or pandas
as they released new versions recently. Can you try to downgrade torch
to 2.0.1
and, if this doesn't work, pandas
to, say 1.5.3
? Hopefully that should work until we publish a hotfix.
(PS: don't forget to delete the checkpoint saved by the trainer in TmrlData/checkpoints
as it would otherwise be corrupted)
Okay, I uninstalled and installed the versions of the current torch and panda and reinstalled the versions you recommended, and it has worked. The issue is with the panda's update, as when I downgraded with torch to 2.0.1, the TypeError was being thrown back, but after downgrading pandas to 1.5.3 it worked. Thanks so much for the help!
Okay, I uninstalled and installed the versions of the current torch and panda and reinstalled the versions you recommended, and it has worked. The issue is with the panda's update, as when I downgraded with torch to 2.0.1, the TypeError was being thrown back, but after downgrading pandas to 1.5.3 it worked. Thanks so much for the help!
Thanks for testing, I'll make tmrl
compatible with the last version of pandas
in the upcoming release
For others who have this bug in the meantime, this should fix it:
pip install pandas==1.5.3
Thanks so much for the help! Since there is now a solution I'll be closing the issue. If its any help, I was using torch == 2.1.0 +cu118, and pandas 2.1.1.
After downgrading just pandas to 1.5.3, it seemed to have completely resolved the issue for me as well.
Well, that was a bit early to close, but the issue should now be resolved in version 0.5.3
:)
I am currently experimenting with the command line interface for this library, but am constantly getting thrown a TypeError.
I have tried experimenting with tweaking the parameters, restarting openplanet, etc, but to no avail.
Here are the command line logs from the trainer, the other 2 scripts appear to be functioning properly.
I am running this in powershell with python 3.10, with no wandb connection