Closed mkolyaei closed 3 months ago
@alanwguo do you know if it's a known issue given it's related to Pydantic?
@alanwguo can you triage?
@mkolyaei , as a workaround can you install a version of pydantic between >= 1.9 and < 2.5?
In the latest master, we've updated this pydantic compatibility logic so starting in ray 2.9.0, you should not run into this.
What happened + What you expected to happen
Dear ray team,
When attempting to initialize Ray with ray.init(local_mode=True), the Ray dashboard failed to start with a return code of 1. my expected output is the initialization of Ray should not lead to issues with 'pydantic' and should not cause the kernel to crash.
I appreciate your help on this matter. Regards, Mary
Versions / Dependencies
Package Version
absl-py 2.0.0 aiocache 0.12.2 aiofiles 23.2.1 aiohttp 3.9.1 aiohttp-cors 0.7.0 aiopubsub 3.0.0 aioredis 2.0.1 aiosignal 1.3.1 aiosmtplib 3.0.1 alembic 1.12.1 aniso8601 7.0.0 annotated-types 0.6.0 anyio 4.1.0 appnope 0.1.3 APScheduler 3.10.4 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asttokens 2.4.1 astunparse 1.6.3 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.1.0 Babel 2.13.1 backcall 0.2.0 beautifulsoup4 4.12.2 bleach 6.1.0 blessed 1.20.0 Bottleneck 1.3.7 cachetools 5.3.2 certifi 2023.11.17 cffi 1.16.0 chardet 3.0.4 charset-normalizer 3.3.2 click 8.1.7 colorama 0.4.6 colorful 0.5.5 colorlog 6.7.0 comm 0.2.0 contourpy 1.2.0 cssutils 2.9.0 cycler 0.12.1 dataframe-image 0.2.2 debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 distlib 0.3.7 dm-tree 0.1.8 docopt 0.6.2 dotmap 1.3.30 entrypoints 0.4 exceptiongroup 1.2.0 executing 2.0.1 Farama-Notifications 0.0.4 fastjsonschema 2.19.0 filelock 3.13.1 flatbuffers 23.5.26 fonttools 4.45.1 fqdn 1.5.1 frozenlist 1.4.0 fsspec 2023.10.0 gast 0.5.4 google-api-core 2.14.0 google-auth 2.23.4 google-auth-oauthlib 1.1.0 google-pasta 0.2.0 googleapis-common-protos 1.61.0 gpustat 1.1.1 GPUtil 1.4.0 graphene 2.1.9 graphene-sqlalchemy 2.3.0 graphql-core 2.3.2 graphql-relay 2.0.1 graphql-server-core 2.0.0 graphql-ws 0.4.4 greenlet 3.0.1 grpcio 1.59.3 gunicorn 21.2.0 gym 0.26.2 gym-notices 0.0.8 gymnasium 0.28.1 gymnasium-notices 0.0.1 h11 0.8.1 h2 3.2.0 h5py 3.10.0 hiredis 2.2.3 hpack 3.0.0 html2image 2.0.4.3 html5tagger 1.3.0 httpcore 0.3.0 httptools 0.6.1 hyperframe 5.2.0 idna 2.10 imageio 2.33.0 importlib-metadata 6.8.0 importlib-resources 6.1.1 install 1.3.5 ipykernel 6.26.0 ipython 8.18.0 ipython-genutils 0.2.0 ipywidgets 8.1.1 isoduration 20.11.0 jax-jumpy 1.0.0 jaxtyping 0.2.23 jedi 0.19.1 Jinja2 3.1.2 joblib 1.3.2 json5 0.9.14 jsonpointer 2.4 jsonschema 4.20.0 jsonschema-specifications 2023.11.1 jupyter 1.0.0 jupyter_client 8.6.0 jupyter-console 6.6.3 jupyter_core 5.5.0 jupyter-events 0.9.0 jupyter-lsp 2.2.1 jupyter_server 2.10.1 jupyter_server_terminals 0.4.4 jupyterlab 4.0.9 jupyterlab_pygments 0.3.0 jupyterlab_server 2.25.2 jupyterlab-widgets 3.0.9 kaleido 0.2.1 keras 2.15.0 kiwisolver 1.4.5 lazy_loader 0.3 libclang 16.0.6 linear-operator 0.5.1 loguru 0.7.2 lxml 4.9.3 lz4 4.3.2 Mako 1.3.0 Markdown 3.5.1 markdown-it-py 3.0.0 markdown2 2.4.10 MarkupSafe 2.1.3 matplotlib 3.8.2 matplotlib-inline 0.1.6 mdurl 0.1.2 mistune 3.0.2 ml-dtypes 0.2.0 mpmath 1.3.0 msgpack 1.0.7 multidict 6.0.4 multipledispatch 1.0.0 mypy-extensions 1.0.0 nbclient 0.9.0 nbconvert 7.11.0 nbformat 5.9.2 nest-asyncio 1.5.8 networkx 3.2.1 notebook 7.0.6 notebook_shim 0.2.3 numexpr 2.8.7 numpy 1.26.2 nvidia-ml-py 12.535.133 oauthlib 3.2.2 opencensus 0.11.3 opencensus-context 0.1.3 opencv-python 4.8.1.78 opt-einsum 3.3.0 optuna 3.4.0 overrides 7.4.0 packaging 23.2 pandas 2.1.3 pandocfilters 1.5.0 parso 0.8.3 passlib 1.7.4 pexpect 4.9.0 pickleshare 0.7.5 Pillow 10.1.0 pip 23.3.1 pipdeptree 2.13.1 platformdirs 4.0.0 plotly 5.18.0 prometheus-client 0.19.0 promise 2.3 prompt-toolkit 3.0.41 protobuf 4.23.4 psutil 5.9.6 ptyprocess 0.7.0 pure-eval 0.2.2 py-postgresql 1.3.0 py-spy 0.3.14 pyaml 23.9.7 pyarrow 14.0.1 pyasn1 0.5.1 pyasn1-modules 0.3.0 pycparser 2.21 pydantic 1.8.2 pydantic_core 2.14.5 Pygments 2.17.2 PyJWT 2.8.0 pyparsing 3.1.1 PyQt5 5.15.10 PyQt5-Qt5 5.15.11 PyQt5-sip 12.13.0 pyre-extensions 0.0.30 pyro-api 0.1.2 pyro-ppl 1.8.6 pytest-runner 6.0.0 python-dateutil 2.8.2 python-editor 1.0.4 python-json-logger 2.0.7 pytz 2023.3.post1 PyWavelets 1.5.0 PyYAML 6.0.1 pyzmq 25.1.1 qtconsole 5.5.1 QtPy 2.4.1 ray 2.8.0 redis 5.0.1 referencing 0.31.0 requests 2.31.0 requests-async 0.5.0 requests-oauthlib 1.3.1 rfc3339-validator 0.1.4 rfc3986 1.5.0 rfc3986-validator 0.1.1 rich 13.7.0 rpds-py 0.13.1 rsa 4.9 ruamel.yaml 0.18.5 ruamel.yaml.clib 0.2.8 Rx 1.6.3 sanic 23.6.0 sanic-compress 0.1.1 Sanic-Cors 2.2.0 Sanic-GraphQL 1.1.0 sanic-jwt 1.8.0 Sanic-Plugins-Framework 0.9.4.post1 sanic-routing 23.6.0 scikit-image 0.22.0 scipy 1.11.4 seaborn 0.13.0 Send2Trash 1.8.2 setuptools 69.0.2 Shimmy 1.3.0 singledispatch 3.7.0 six 1.16.0 smart-open 6.4.0 sniffio 1.3.0 soupsieve 2.5 SQLAlchemy 1.4.50 SQLAlchemy-Utils 0.41.1 stable-baselines3 2.2.1 stack-data 0.6.3 stripe 7.6.0 sympy 1.12 tabulate 0.9.0 tenacity 8.2.3 tensorboard 2.15.1 tensorboard-data-server 0.7.2 tensorboardX 2.6.2.2 tensorflow 2.15.0 tensorflow-estimator 2.15.0 tensorflow-io-gcs-filesystem 0.34.0 termcolor 2.3.0 terminado 0.18.0 threadpoolctl 3.2.0 tifffile 2023.9.26 tinycss2 1.2.1 tomli 2.0.1 torch 2.1.1 tornado 6.3.3 tqdm 4.66.1 tracerite 1.1.1 traitlets 5.13.0 typeguard 2.13.3 typer 0.9.0 types-python-dateutil 2.8.19.14 typing_extensions 4.8.0 typing-inspect 0.9.0 tzdata 2023.3 tzlocal 5.2 ujson 5.8.0 uri-template 1.3.0 urllib3 2.1.0 uvloop 0.19.0 virtualenv 20.24.7 wcwidth 0.2.12 webcolors 1.13 webencodings 0.5.1 websocket-client 1.6.4 websockets 12.0 Werkzeug 3.0.1 wheel 0.42.0 widgetsnbextension 4.0.9 wrapt 1.14.1 yarl 1.9.3 zipp 3.17.0
Reproduction script
number of episodes for RLib agents
num_episodes_ray = 50000
stop trials at least from this number of episodes
grace_period_ray = num_episodes_ray / 10
dir for saving Ray results
ray_dir = 'ray_results'
creating necessary dir
if not os.path.exists(f"{local_dir+'/'+ray_dir}"): os.makedirs(f"{local_dir+'/'+ray_dir}")
from ray.rllib.algorithms.ppo.ppo import PPOConfig from ray.rllib.algorithms.ppo.ppo import PPO as ppo
from ray.rllib.algorithms.ppo.ppo_learner import PPOLearnerHyperparameters
algorithms = { 'PPO': PPOLearnerHyperparameters }
from ray import air
config_PPO = PPOConfig() config_PPO.framework("torch") config_PPO.environment(env="SupplyChain") config_PPO.log_level = "WARN"
config_PPO.rollouts( rollout_fragment_length=tune.grid_search([20, 200]), num_rollout_workers=num_cpus - 1, sample_async=False )
config_PPO.resources(num_gpus=num_gpus)
Set training parameters
config_PPO.training( gamma=0.99, grad_clip=tune.grid_search([None, 20.0]), train_batch_size=tune.grid_search([400, 4000]), lr=tune.grid_search([5e-3, 5e-4]), sgd_minibatch_size=tune.grid_search([64, 128]) )
config_PPO['model']['fcnet_hiddens'] = tune.grid_search([[64, 64], [128, 128]])
config_PPO['seed'] = 2023
Set additional training parameters
config_PPO["num_sgd_iter"] = tune.grid_search([15, 30]) config_PPO["horizon"] =env.T-1 config_PPO['evaluation_num_episodes'] = 1000
config_PPO["sgd_minibatch_size"] = tune.grid_search([64, 128])
print(config_PPO.to_dict())
trainer = config_PPO.build()
def train(algorithm, config, verbose, num_episodes_ray=num_episodes_ray, grace_period_ray=grace_period_ray, local_dir=local_dir, ray_dir=ray_dir): """ Train a RLib Agent. """
initializing Ray
def result_df_as_image(result_df, algorithm, local_dir=local_dir, plots_dir=plots_dir): """ Visualize the (DataFrame) RLib Agent's result as an image. """
creating necessary subdir and saving plot
def calculate_training_time(result_df): """ Calculate a RLib Agent training time (minutes). """ return int(result_df.time_total_s[0]//60)
def calculate_training_episodes(result_df): """ Calculate a RLib Agent training episodes (number). """ return round(result_df.episodes_total[0], -3)
def load_policy(algorithm, config, checkpoint): """ Load a RLib Agent policy. """
initializing Ray
def fix_best_checkpoint(checkpoint): """ Fix a RLib Agent best checkpoint path. """
searching all checkpoints related to the best agent's result
ray.init(local_mode=True) (results_PPO, best_result_PPO, best_config_PPO, checkpoint_PPO) = train(algorithms['PPO'], config_PPO, verbose)
2023-11-30 13:43:54,571 ERROR services.py:1329 -- Failed to start the dashboard , return code 1 2023-11-30 13:43:54,577 ERROR services.py:1354 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is. 2023-11-30 13:43:54,602 ERROR services.py:1398 -- The last 20 lines of /tmp/ray/session_2023-11-30_13-43-49_981892_16350/logs/dashboard.log (it contains the error message from the dashboard): File "/Users/marri/opt/anaconda3/envs/RL/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/Users/marri/opt/anaconda3/envs/RL/lib/python3.9/site-packages/ray/dashboard/modules/job/cli.py", line 16, in
from ray.job_submission import JobStatus, JobSubmissionClient
File "/Users/marri/opt/anaconda3/envs/RL/lib/python3.9/site-packages/ray/job_submission/init.py", line 2, in
from ray.dashboard.modules.job.pydantic_models import DriverInfo, JobDetails, JobType
File "/Users/marri/opt/anaconda3/envs/RL/lib/python3.9/site-packages/ray/dashboard/modules/job/pydantic_models.py", line 4, in
from ray._private.pydantic_compat import BaseModel, Field, PYDANTIC_INSTALLED
File "/Users/marri/opt/anaconda3/envs/RL/lib/python3.9/site-packages/ray/_private/pydantic_compat.py", line 100, in
monkeypatch_pydantic_2_for_cloudpickle()
File "/Users/marri/opt/anaconda3/envs/RL/lib/python3.9/site-packages/ray/_private/pydantic_compat.py", line 58, in monkeypatch_pydantic_2_for_cloudpickle
pydantic._internal._model_construction.SchemaSerializer = (
AttributeError: module 'pydantic' has no attribute '_internal'
2023-11-30 13:43:54,891 INFO worker.py:1673 -- Started a local Ray instance.
Issue Severity
High: It blocks me from completing my task.