When I try to run this package on my linux server, I encourage the following error
tensorflow.python.framework.errors_impl.NotFoundError: /home/.local/lib/python3.8/site-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so: undefined symbol: _ZN10tensorflow14kerne
l_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefEN4absl12lts_2021032411string_viewESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteIS9_EE
...
wandb: ERROR Internal wandb error: file data was not synced
...
Exception: The wandb backend process has shutdown
Error in atexit._run_exitfuncs:
...
Exception: The wandb backend process has shutdown
The command I used is python -m SimpleSAC.conservative_sac_main --env 'halfcheetah-medium-v0' --logging.output_dir './experiment_output' --device "cuda:0", as per your recommendation.
May I ask how may I fix this error and run this project on my server?
Besides, I have the following two general questions:
I notice that this CQL implementation uses n_epochs=2000, which IMHO is longer than typical offline RL methods. Can we reduce the number of training epochs to, say, 1000 as in BCQ?
Do you have recommendation for the hyperparameter settings for the Maze2D and Adroit domains of tasks in the D4RL dataset? I have tried to replicate the CQL results on the D4RL whitepaper using the original CQL repo, but was unsuccessful.
Dear author,
Thanks for providing this excellent package!
When I try to run this package on my linux server, I encourage the following error
The command I used is
python -m SimpleSAC.conservative_sac_main --env 'halfcheetah-medium-v0' --logging.output_dir './experiment_output' --device "cuda:0"
, as per your recommendation.May I ask how may I fix this error and run this project on my server?
Besides, I have the following two general questions:
n_epochs=2000
, which IMHO is longer than typical offline RL methods. Can we reduce the number of training epochs to, say, 1000 as in BCQ?