Open wangzeyu135798 opened 2 months ago
方便提供下另一方的报错吗?
另一个方没有报错,日志如下: [root@kuscia-autonomy-alice-56db7f7ffc-9gzsl logs]# kubectl get kt job-split-1 -oyaml -n cross-domain apiVersion: kuscia.secretflow/v1alpha1 kind: KusciaTask metadata: annotations: kuscia.secretflow/initiator: alice kuscia.secretflow/interconn-bfia-parties: "" kuscia.secretflow/interconn-kuscia-parties: bob kuscia.secretflow/interconn-self-parties: alice kuscia.secretflow/job-id: job-best-effort-linear kuscia.secretflow/self-cluster-as-initiator: "true" kuscia.secretflow/task-alias: job-split-1 creationTimestamp: "2024-07-10T07:26:37Z" generation: 1 labels: kuscia.secretflow/controller: kuscia-job kuscia.secretflow/job-uid: bdf0116a-e7f5-4673-981a-8281857c059a name: job-split-1 namespace: cross-domain ownerReferences:
WARNING:root:Since the GPL-licensed package unidecode
is not installed, using Python's unicodedata
package which yields worse results.
2024-07-10 15:26:43,016|bob|INFO|secretflow|entry.py:start_ray:59| ray_conf: RayConfig(ray_node_ip_address='job-split-1-partner-0-global.bob.svc', ray_node_manager_port=21797, ray_object_manager_port=21792, ray_client_server_port=21793, ray_worker_ports=[], ray_gcs_port=21796)
2024-07-10 15:26:43,016|bob|INFO|secretflow|entry.py:start_ray:63| Trying to start ray head node at job-split-1-partner-0-global.bob.svc, start command: RAY_BACKEND_LOG_LEVEL=debug RAY_grpc_enable_http_proxy=true OMP_NUM_THREADS=8 ray start --head --include-dashboard=false --disable-usage-stats --num-cpus=32 --node-ip-address=job-split-1-partner-0-global.bob.svc --port=21796 --node-manager-port=21797 --object-manager-port=21792 --ray-client-server-port=21793
2024-07-10 15:26:46,577|bob|INFO|secretflow|entry.py:start_ray:80| 2024-07-10 15:26:43,625 INFO usage_lib.py:423 -- Usage stats collection is disabled.
2024-07-10 15:26:43,626 INFO scripts.py:744 -- Local node IP: job-split-1-partner-0-global.bob.svc
2024-07-10 15:26:46,415 SUCC scripts.py:781 -- --------------------
2024-07-10 15:26:46,415 SUCC scripts.py:782 -- Ray runtime started.
2024-07-10 15:26:46,415 SUCC scripts.py:783 -- --------------------
2024-07-10 15:26:46,415 INFO scripts.py:785 -- Next steps
2024-07-10 15:26:46,415 INFO scripts.py:788 -- To add another node to this Ray cluster, run
2024-07-10 15:26:46,415 INFO scripts.py:791 -- ray start --address='job-split-1-partner-0-global.bob.svc:21796'
2024-07-10 15:26:46,416 INFO scripts.py:800 -- To connect to this Ray cluster:
2024-07-10 15:26:46,416 INFO scripts.py:802 -- import ray
2024-07-10 15:26:46,416 INFO scripts.py:803 -- ray.init(_node_ip_address='job-split-1-partner-0-global.bob.svc')
2024-07-10 15:26:46,416 INFO scripts.py:834 -- To terminate the Ray runtime, run
2024-07-10 15:26:46,416 INFO scripts.py:835 -- ray stop
2024-07-10 15:26:46,416 INFO scripts.py:838 -- To view the status of the cluster, use
2024-07-10 15:26:46,416 INFO scripts.py:839 -- ray status
2024-07-10 15:26:46,577|bob|INFO|secretflow|entry.py:start_ray:81| Succeeded to start ray head node at job-split-1-partner-0-global.bob.svc. 2024-07-10 15:26:46,578|bob|INFO|secretflow|entry.py:main:510| datasource.access_directly True sf_node_eval_param { "domain": "data_prep", "name": "train_test_split", "version": "0.0.1", "attrPaths": [ "train_size", "test_size", "random_state", "shuffle" ], "attrs": [ { "f": 0.75 }, { "f": 0.25 }, { "i64": "1234" }, { "b": true } ] } 2024-07-10 15:26:46,585|bob|INFO|secretflow|entry.py:domaindata_id_to_dist_data:160| domaindata_id psi-output-1 to ........... name: "psi-output-1.csv" type: "sf.table.vertical_table" system_info { } meta { type_url: "type.googleapis.com/secretflow.spec.v1.VerticalTable" value: "\n\335\003\n\003id1\022\003age\022\teducation\022\007default\022\007balance\022\007housing\022\004loan\022\003day\022\010duration\022\010campaign\022\005pdays\022\010previous\022\017job_blue-collar\022\020job_entrepreneur\022\rjob_housemaid\022\016job_management\022\013job_retired\022\021job_self-employed\022\014job_services\022\013job_student\022\016job_technician\022\016job_unemployed\022\020marital_divorced\022\017marital_married\022\016marital_single\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\n\227\003\n\003id2\022\020contact_cellular\022\021contact_telephone\022\017contact_unknown\022\tmonth_apr\022\tmonth_aug\022\tmonth_dec\022\tmonth_feb\022\tmonth_jan\022\tmonth_jul\022\tmonth_jun\022\tmonth_mar\022\tmonth_may\022\tmonth_nov\022\tmonth_oct\022\tmonth_sep\022\020poutcome_failure\022\016poutcome_other\022\020poutcome_success\022\020poutcome_unknown\022\001y\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\003int\020\244M" } data_refs { uri: "psi-output-1.csv" party: "alice" format: "csv" } data_refs { uri: "psi-output-1.csv" party: "bob" format: "csv" }
param
domain: "data_prep" name: "train_test_split" version: "0.0.1" attr_paths: "train_size" attr_paths: "test_size" attr_paths: "random_state" attr_paths: "shuffle" attrs { f: 0.75 } attrs { f: 0.25 } attrs { i64: 1234 } attrs { b: true } inputs { name: "psi-output-1.csv" type: "sf.table.vertical_table" system_info { } meta { type_url: "type.googleapis.com/secretflow.spec.v1.VerticalTable" value: "\n\335\003\n\003id1\022\003age\022\teducation\022\007default\022\007balance\022\007housing\022\004loan\022\003day\022\010duration\022\010campaign\022\005pdays\022\010previous\022\017job_blue-collar\022\020job_entrepreneur\022\rjob_housemaid\022\016job_management\022\013job_retired\022\021job_self-employed\022\014job_services\022\013job_student\022\016job_technician\022\016job_unemployed\022\020marital_divorced\022\017marital_married\022\016marital_single\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\n\227\003\n\003id2\022\020contact_cellular\022\021contact_telephone\022\017contact_unknown\022\tmonth_apr\022\tmonth_aug\022\tmonth_dec\022\tmonth_feb\022\tmonth_jan\022\tmonth_jul\022\tmonth_jun\022\tmonth_mar\022\tmonth_may\022\tmonth_nov\022\tmonth_oct\022\tmonth_sep\022\020poutcome_failure\022\016poutcome_other\022\020poutcome_success\022\020poutcome_unknown\022\001y\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\003int\020\244M" } data_refs { uri: "psi-output-1.csv" party: "alice" format: "csv" } data_refs { uri: "psi-output-1.csv" party: "bob" format: "csv" } } output_uris: "train-dataset-1.csv" output_uris: "test-dataset-1.csv"
--
storage_config
type: "local_fs" local_fs { wd: "/home/kuscia/var/storage/data" }
--
cluster_config
desc { parties: "alice" parties: "bob" devices { name: "spu" type: "spu" parties: "alice" parties: "bob" config: "{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}" } devices { name: "heu" type: "heu" parties: "alice" parties: "bob" config: "{\"mode\": \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}" } ray_fed_config { cross_silo_comm_backend: "brpc_link" } } public_config { ray_fed_config { parties: "alice" parties: "bob" addresses: "job-split-1-partner-0-fed.alice.svc:80" addresses: "0.0.0.0:21795" } spu_configs { name: "spu" parties: "alice" parties: "bob" addresses: "http://job-split-1-partner-0-spu.alice.svc:80" addresses: "0.0.0.0:21794" } } private_config { self_party: "bob" ray_head_addr: "job-split-1-partner-0-global.bob.svc:21796" }
--
2024-07-10 15:26:46,586|bob|WARNING|secretflow|driver.py:init:442| When connecting to an existing cluster, num_cpus must not be provided. Num_cpus is neglected at this moment.
2024-07-10 15:26:46,586 INFO worker.py:1540 -- Connecting to existing Ray cluster at address: job-split-1-partner-0-global.bob.svc:21796...
2024-07-10 15:26:46,593|bob|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140181644377488 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/node_ip_address.json.lock
2024-07-10 15:26:46,593|bob|DEBUG|secretflow|_api.py:acquire:297| Lock 140181644377488 acquired on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/node_ip_address.json.lock
2024-07-10 15:26:46,593|bob|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140181644377488 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/node_ip_address.json.lock
2024-07-10 15:26:46,593|bob|DEBUG|secretflow|_api.py:release:330| Lock 140181644377488 released on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/node_ip_address.json.lock
2024-07-10 15:26:46,595|bob|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140181644377536 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:acquire:297| Lock 140181644377536 acquired on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140181644377536 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:release:330| Lock 140181644377536 released on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140181644377440 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:acquire:297| Lock 140181644377440 acquired on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140181644377440 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:release:330| Lock 140181644377440 released on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140181644377632 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:acquire:297| Lock 140181644377632 acquired on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140181644377632 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,596|bob|DEBUG|secretflow|_api.py:release:330| Lock 140181644377632 released on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,597|bob|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140181644377488 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,597|bob|DEBUG|secretflow|_api.py:acquire:297| Lock 140181644377488 acquired on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,597|bob|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140181644377488 on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,597|bob|DEBUG|secretflow|_api.py:release:330| Lock 140181644377488 released on /tmp/ray/session_2024-07-10_15-26-43_626562_22147/ports_by_node.json.lock
2024-07-10 15:26:46,597 INFO worker.py:1724 -- Connected to Ray cluster.
2024-07-10 15:26:47.306 INFO api.py:233 [bob] -- [Anonymous_job] Started rayfed with {'CLUSTER_ADDRESSES': {'alice': 'http://job-split-1-partner-0-fed.alice.svc:80', 'bob': '0.0.0.0:21795'}, 'CURRENT_PARTY_NAME': 'bob', 'TLS_CONFIG': {}}
[33m(raylet)[0m [2024-07-10 15:26:47,275 I 22355 22355] logging.cc:230: Set ray log level from environment variable RAY_BACKEND_LOG_LEVEL to -1
[36m(SenderReceiverProxyActor pid=22634)[0m 2024-07-10 15:26:48.319 INFO link.py:38 [bob] -- [Anonymous_job] brpc options: {'proxy_max_restarts': 3, 'timeout_in_ms': 300000, 'recv_timeout_ms': 604800000, 'connect_retry_times': 3600, 'connect_retry_interval_ms': 1000, 'brpc_channel_protocol': 'http', 'brpc_channel_connection_type': 'pooled', 'exit_on_sending_failure': True}
[36m(SenderReceiverProxyActor pid=22634)[0m I0710 15:26:48.328158 22634 external/com_github_brpc_brpc/src/brpc/server.cpp:1181] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=21795.
[36m(SenderReceiverProxyActor pid=22634)[0m W0710 15:26:48.328190 22634 external/com_github_brpc_brpc/src/brpc/server.cpp:1187] Builtin services are disabled according to ServerOptions.has_builtin_services
[36m(SenderReceiverProxyActor pid=22634)[0m I0710 15:26:49.969624 22690 external/com_github_brpc_brpc/src/brpc/span.cpp:506] Opened ./rpc_data/rpcz/20240710.152649.22634/id.db and ./rpc_data/rpcz/20240710.152649.22634/time.db
2024-07-10 15:26:52.351 INFO barriers.py:465 [bob] -- [Anonymous_job] Succeeded to create receiver proxy actor.
2024-07-10 15:26:52.351 INFO barriers.py:520 [bob] -- [Anonymous_job] Try ping ['alice'] at 0 attemp, up to 3600 attemps.
[36m(_run pid=22355)[0m WARNING:root:Since the GPL-licensed package unidecode
is not installed, using Python's unicodedata
package which yields worse results.
[33m(raylet)[0m [2024-07-10 15:26:47,745 I 22634 22634] logging.cc:230: Set ray log level from environment variable RAY_BACKEND_LOG_LEVEL to -1
2024-07-10 15:26:54.246 ERROR component.py:1129 [bob] -- [Anonymous_job] eval on domain: "data_prep"
name: "train_test_split"
version: "0.0.1"
attr_paths: "train_size"
attr_paths: "test_size"
attr_paths: "random_state"
attr_paths: "shuffle"
attrs {
f: 0.75
}
attrs {
f: 0.25
}
attrs {
i64: 1234
}
attrs {
b: true
}
inputs {
name: "psi-output-1.csv"
type: "sf.table.vertical_table"
system_info {
}
meta {
type_url: "type.googleapis.com/secretflow.spec.v1.VerticalTable"
value: "\n\335\003\n\003id1\022\003age\022\teducation\022\007default\022\007balance\022\007housing\022\004loan\022\003day\022\010duration\022\010campaign\022\005pdays\022\010previous\022\017job_blue-collar\022\020job_entrepreneur\022\rjob_housemaid\022\016job_management\022\013job_retired\022\021job_self-employed\022\014job_services\022\013job_student\022\016job_technician\022\016job_unemployed\022\020marital_divorced\022\017marital_married\022\016marital_single\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\n\227\003\n\003id2\022\020contact_cellular\022\021contact_telephone\022\017contact_unknown\022\tmonth_apr\022\tmonth_aug\022\tmonth_dec\022\tmonth_feb\022\tmonth_jan\022\tmonth_jul\022\tmonth_jun\022\tmonth_mar\022\tmonth_may\022\tmonth_nov\022\tmonth_oct\022\tmonth_sep\022\020poutcome_failure\022\016poutcome_other\022\020poutcome_success\022\020poutcome_unknown\022\001y\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\003int\020\244M"
}
data_refs {
uri: "psi-output-1.csv"
party: "alice"
format: "csv"
}
data_refs {
uri: "psi-output-1.csv"
party: "bob"
format: "csv"
}
}
output_uris: "train-dataset-1.csv"
output_uris: "test-dataset-1.csv"
failed, error <[36mray::_run()[39m (pid=22355, ip=job-split-1-partner-0-global.bob.svc)
File "/usr/local/lib/python3.10/site-packages/secretflow/device/device/pyu.py", line 156, in _run
return fn(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/secretflow/component/data_utils.py", line 382, in
WARNING:root:Since the GPL-licensed package unidecode
is not installed, using Python's unicodedata
package which yields worse results.
2024-07-10 15:26:43,898|alice|INFO|secretflow|entry.py:start_ray:59| ray_conf: RayConfig(ray_node_ip_address='job-split-1-partner-0-global.alice.svc', ray_node_manager_port=25098, ray_object_manager_port=25099, ray_client_server_port=25100, ray_worker_ports=[], ray_gcs_port=25097)
2024-07-10 15:26:43,898|alice|INFO|secretflow|entry.py:start_ray:63| Trying to start ray head node at job-split-1-partner-0-global.alice.svc, start command: RAY_BACKEND_LOG_LEVEL=debug RAY_grpc_enable_http_proxy=true OMP_NUM_THREADS=4 ray start --head --include-dashboard=false --disable-usage-stats --num-cpus=32 --node-ip-address=job-split-1-partner-0-global.alice.svc --port=25097 --node-manager-port=25098 --object-manager-port=25099 --ray-client-server-port=25100
2024-07-10 15:26:47,658|alice|INFO|secretflow|entry.py:start_ray:80| 2024-07-10 15:26:44,592 INFO usage_lib.py:423 -- Usage stats collection is disabled.
2024-07-10 15:26:44,592 INFO scripts.py:744 -- Local node IP: job-split-1-partner-0-global.alice.svc
2024-07-10 15:26:47,518 SUCC scripts.py:781 -- --------------------
2024-07-10 15:26:47,518 SUCC scripts.py:782 -- Ray runtime started.
2024-07-10 15:26:47,518 SUCC scripts.py:783 -- --------------------
2024-07-10 15:26:47,518 INFO scripts.py:785 -- Next steps
2024-07-10 15:26:47,518 INFO scripts.py:788 -- To add another node to this Ray cluster, run
2024-07-10 15:26:47,518 INFO scripts.py:791 -- ray start --address='job-split-1-partner-0-global.alice.svc:25097'
2024-07-10 15:26:47,518 INFO scripts.py:800 -- To connect to this Ray cluster:
2024-07-10 15:26:47,518 INFO scripts.py:802 -- import ray
2024-07-10 15:26:47,519 INFO scripts.py:803 -- ray.init(_node_ip_address='job-split-1-partner-0-global.alice.svc')
2024-07-10 15:26:47,519 INFO scripts.py:834 -- To terminate the Ray runtime, run
2024-07-10 15:26:47,519 INFO scripts.py:835 -- ray stop
2024-07-10 15:26:47,519 INFO scripts.py:838 -- To view the status of the cluster, use
2024-07-10 15:26:47,519 INFO scripts.py:839 -- ray status
2024-07-10 15:26:47,658|alice|INFO|secretflow|entry.py:start_ray:81| Succeeded to start ray head node at job-split-1-partner-0-global.alice.svc. 2024-07-10 15:26:47,659|alice|INFO|secretflow|entry.py:main:510| datasource.access_directly True sf_node_eval_param { "domain": "data_prep", "name": "train_test_split", "version": "0.0.1", "attrPaths": [ "train_size", "test_size", "random_state", "shuffle" ], "attrs": [ { "f": 0.75 }, { "f": 0.25 }, { "i64": "1234" }, { "b": true } ] } 2024-07-10 15:26:47,667|alice|INFO|secretflow|entry.py:domaindata_id_to_dist_data:160| domaindata_id psi-output-1 to ........... name: "psi-output-1.csv" type: "sf.table.vertical_table" system_info { } meta { type_url: "type.googleapis.com/secretflow.spec.v1.VerticalTable" value: "\n\335\003\n\003id1\022\003age\022\teducation\022\007default\022\007balance\022\007housing\022\004loan\022\003day\022\010duration\022\010campaign\022\005pdays\022\010previous\022\017job_blue-collar\022\020job_entrepreneur\022\rjob_housemaid\022\016job_management\022\013job_retired\022\021job_self-employed\022\014job_services\022\013job_student\022\016job_technician\022\016job_unemployed\022\020marital_divorced\022\017marital_married\022\016marital_single\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\n\227\003\n\003id2\022\020contact_cellular\022\021contact_telephone\022\017contact_unknown\022\tmonth_apr\022\tmonth_aug\022\tmonth_dec\022\tmonth_feb\022\tmonth_jan\022\tmonth_jul\022\tmonth_jun\022\tmonth_mar\022\tmonth_may\022\tmonth_nov\022\tmonth_oct\022\tmonth_sep\022\020poutcome_failure\022\016poutcome_other\022\020poutcome_success\022\020poutcome_unknown\022\001y\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\003int\020\244M" } data_refs { uri: "psi-output-1.csv" party: "alice" format: "csv" } data_refs { uri: "psi-output-1.csv" party: "bob" format: "csv" }
param
domain: "data_prep" name: "train_test_split" version: "0.0.1" attr_paths: "train_size" attr_paths: "test_size" attr_paths: "random_state" attr_paths: "shuffle" attrs { f: 0.75 } attrs { f: 0.25 } attrs { i64: 1234 } attrs { b: true } inputs { name: "psi-output-1.csv" type: "sf.table.vertical_table" system_info { } meta { type_url: "type.googleapis.com/secretflow.spec.v1.VerticalTable" value: "\n\335\003\n\003id1\022\003age\022\teducation\022\007default\022\007balance\022\007housing\022\004loan\022\003day\022\010duration\022\010campaign\022\005pdays\022\010previous\022\017job_blue-collar\022\020job_entrepreneur\022\rjob_housemaid\022\016job_management\022\013job_retired\022\021job_self-employed\022\014job_services\022\013job_student\022\016job_technician\022\016job_unemployed\022\020marital_divorced\022\017marital_married\022\016marital_single\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\n\227\003\n\003id2\022\020contact_cellular\022\021contact_telephone\022\017contact_unknown\022\tmonth_apr\022\tmonth_aug\022\tmonth_dec\022\tmonth_feb\022\tmonth_jan\022\tmonth_jul\022\tmonth_jun\022\tmonth_mar\022\tmonth_may\022\tmonth_nov\022\tmonth_oct\022\tmonth_sep\022\020poutcome_failure\022\016poutcome_other\022\020poutcome_success\022\020poutcome_unknown\022\001y\"\003str\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\005float\003int\020\244M" } data_refs { uri: "psi-output-1.csv" party: "alice" format: "csv" } data_refs { uri: "psi-output-1.csv" party: "bob" format: "csv" } } output_uris: "train-dataset-1.csv" output_uris: "test-dataset-1.csv"
--
storage_config
type: "local_fs" local_fs { wd: "/home/kuscia/var/storage/data" }
--
cluster_config
desc { parties: "alice" parties: "bob" devices { name: "spu" type: "spu" parties: "alice" parties: "bob" config: "{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}" } devices { name: "heu" type: "heu" parties: "alice" parties: "bob" config: "{\"mode\": \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}" } ray_fed_config { cross_silo_comm_backend: "brpc_link" } } public_config { ray_fed_config { parties: "alice" parties: "bob" addresses: "0.0.0.0:25102" addresses: "job-split-1-partner-0-fed.bob.svc:80" } spu_configs { name: "spu" parties: "alice" parties: "bob" addresses: "0.0.0.0:25101" addresses: "http://job-split-1-partner-0-spu.bob.svc:80" } } private_config { self_party: "alice" ray_head_addr: "job-split-1-partner-0-global.alice.svc:25097" }
--
2024-07-10 15:26:47,668|alice|WARNING|secretflow|driver.py:init:442| When connecting to an existing cluster, num_cpus must not be provided. Num_cpus is neglected at this moment.
2024-07-10 15:26:47,669 INFO worker.py:1540 -- Connecting to existing Ray cluster at address: job-split-1-partner-0-global.alice.svc:25097...
2024-07-10 15:26:47,675|alice|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140343807753376 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/node_ip_address.json.lock
2024-07-10 15:26:47,676|alice|DEBUG|secretflow|_api.py:acquire:297| Lock 140343807753376 acquired on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/node_ip_address.json.lock
2024-07-10 15:26:47,676|alice|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140343807753376 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/node_ip_address.json.lock
2024-07-10 15:26:47,676|alice|DEBUG|secretflow|_api.py:release:330| Lock 140343807753376 released on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/node_ip_address.json.lock
2024-07-10 15:26:47,679|alice|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140343807753424 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,679|alice|DEBUG|secretflow|_api.py:acquire:297| Lock 140343807753424 acquired on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,679|alice|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140343807753424 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,680|alice|DEBUG|secretflow|_api.py:release:330| Lock 140343807753424 released on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,680|alice|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140343807753328 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,680|alice|DEBUG|secretflow|_api.py:acquire:297| Lock 140343807753328 acquired on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,681|alice|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140343807753328 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,681|alice|DEBUG|secretflow|_api.py:release:330| Lock 140343807753328 released on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,681|alice|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140343807753520 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,681|alice|DEBUG|secretflow|_api.py:acquire:297| Lock 140343807753520 acquired on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,681|alice|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140343807753520 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,681|alice|DEBUG|secretflow|_api.py:release:330| Lock 140343807753520 released on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,681|alice|DEBUG|secretflow|_api.py:acquire:294| Attempting to acquire lock 140343807753376 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,682|alice|DEBUG|secretflow|_api.py:acquire:297| Lock 140343807753376 acquired on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,682|alice|DEBUG|secretflow|_api.py:release:327| Attempting to release lock 140343807753376 on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,682|alice|DEBUG|secretflow|_api.py:release:330| Lock 140343807753376 released on /tmp/ray/session_2024-07-10_15-26-44_593387_9540/ports_by_node.json.lock
2024-07-10 15:26:47,682 INFO worker.py:1724 -- Connected to Ray cluster.
2024-07-10 15:26:48.575 INFO api.py:233 [alice] -- [Anonymous_job] Started rayfed with {'CLUSTER_ADDRESSES': {'alice': '0.0.0.0:25102', 'bob': 'http://job-split-1-partner-0-fed.bob.svc:80'}, 'CURRENT_PARTY_NAME': 'alice', 'TLS_CONFIG': {}}
[33m(raylet)[0m [2024-07-10 15:26:49,120 I 10119 10119] logging.cc:230: Set ray log level from environment variable RAY_BACKEND_LOG_LEVEL to -1
[36m(SenderReceiverProxyActor pid=10119)[0m 2024-07-10 15:26:49.883 INFO link.py:38 [alice] -- [Anonymous_job] brpc options: {'proxy_max_restarts': 3, 'timeout_in_ms': 300000, 'recv_timeout_ms': 604800000, 'connect_retry_times': 3600, 'connect_retry_interval_ms': 1000, 'brpc_channel_protocol': 'http', 'brpc_channel_connection_type': 'pooled', 'exit_on_sending_failure': True}
[36m(SenderReceiverProxyActor pid=10119)[0m I0710 15:26:49.891576 10119 external/com_github_brpc_brpc/src/brpc/server.cpp:1181] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=25102.
[36m(SenderReceiverProxyActor pid=10119)[0m W0710 15:26:49.891611 10119 external/com_github_brpc_brpc/src/brpc/server.cpp:1187] Builtin services are disabled according to ServerOptions.has_builtin_services
2024-07-10 15:26:52.346 INFO barriers.py:465 [alice] -- [Anonymous_job] Succeeded to create receiver proxy actor.
2024-07-10 15:26:52.347 INFO barriers.py:520 [alice] -- [Anonymous_job] Try ping ['bob'] at 0 attemp, up to 3600 attemps.
[36m(SenderReceiverProxyActor pid=10119)[0m I0710 15:26:52.410292 10214 external/com_github_brpc_brpc/src/brpc/span.cpp:506] Opened ./rpc_data/rpcz/20240710.152652.10119/id.db and ./rpc_data/rpcz/20240710.152652.10119/time.db
[36m(_run pid=9847)[0m WARNING:root:Since the GPL-licensed package unidecode
is not installed, using Python's unicodedata
package which yields worse results.
你到bob的容器内,看下psi-output-1.csv 这个文件生成了吗
Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.
Issue Type
Running
Search for existing issues similar to yours
Yes
OS Platform and Distribution
centos7
Kuscia Version
kuscia0.8
Deployment
k8s
deployment Version
1.19
App Running type
secretflow
App Running version
1.7
Configuration file used to run kuscia.
What happend and What you expected to happen.
Kuscia log output.