Open GitEasonXu opened 3 years ago
You can try init_orca_context(..., extra_python_lib='your/local/package')
. @hkvision Am I right?
It is not work that init_orca_context
add extra_python_lib
config.
When I submit task to yarn manager, zoo pack current python env whether only include conda environment package and not the project file? So how does the yarn worker
load the project file.
@GitEasonXu
The argument passed to extra_python_lib
will be passed to --py-files
in spark-submit command. The value should be a comma seperated .py, .zip or .egg files.
How do you use init_orca_context
with extra_python_lib
? Could you provide your code?
@GitEasonXu did you try the suggestions above?
I've just tried your suggestions, but cannot solve the problem. Here are some code snippets.
import time
cluster_mode = "yarn"
extra_python_package='my_project_root_path'
if cluster_mode == "local":
init_orca_context(cluster_mode="local", cores=4, init_ray_on_spark=True)
elif cluster_mode == "yarn":
init_orca_context(cluster_mode="yarn-client", num_nodes=4, cores=10,
init_ray_on_spark=True, memory="16g", driver_memory="16g",
hadoop_user_name='hdfs', hadoop_conf="/etc/hadoop/3.0.1.0-187/0/",
extra_python_lib=extra_python_package)
hi @GitEasonXu , as I said, extra python lib should be a path to .py , .zip or .egg files, and directory is not supported. Extra python lib has the same semantic as spark —py-files, you can try it on spark first, If it works on spark, it should work on az.
@hkvision hi Kai do you think it is possible to automatically pack python packages in the current directory? Say packing all .py files and all directory with init.py files in the current directory.
with modulepickle?
with modulepickle?
Wouldn't it be a little dangerous to override all the cloudpickle usage in both spark and ray?
Worker cannot find my projecet modules when ray start serialization.