secretflow / kuscia

Kuscia(Kubernetes-based Secure Collaborative InfrA) is a K8s-based privacy-preserving computing task orchestration framework.
https://www.secretflow.org.cn/docs/kuscia/latest/zh-Hans
Apache License 2.0
73 stars 56 forks source link

psi 结果不正确 #460

Open ruhengChen opened 4 days ago

ruhengChen commented 4 days ago

Issue Type

Install/Deploy

Search for existing issues similar to yours

Yes

OS Platform and Distribution

centos

Kuscia Version

0.12.0b0

Deployment

docker

deployment Version

24.0.4

App Running type

secretflow

App Running version

1.10.0b0

Configuration file used to run kuscia.

mode: autonomy
domainID: alice
domainKeyData: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBMkoyWWllbnQvKzZ0eG5uQThhMEJ1TE5DYjNCYU55SHV6OTdSQ0NHNGNJajllMFFICmc2Q2ZRWEF3eEkrWHMwa3RrZ3FwSWNtZ1hRRXVmUCs3RFN5MWNrMTVJSUNpUVA2M1o1N1p2bDErSk4wWkt2Y00KZmJYR2x6cDA3YjdVWWZaaUZXQTNYS0UxdVJpNm5QRFNIcUNzS2F3bjNsU003TTJJaUZqaWNqR1dZenpzbXRSKwp1TnlobnJsUGQraHJlSjdsZ3JUa0t1djVJRnhlbjk5ZUZjV3NXKytldEhVZkZMSnU0Wmt3WlcxNVdmdTc1Q1V5CjlYQ1B4bW1yb1lWU1p3Uk9NbVk4UFMwNk5GK0Y4TEIzeE4yMHlSSEQycGRlbkNhSlkzNmZ2RWh5ZkdsL2RyYWwKKytmUGgwS0tpWjRXYzdIZFdPTncyaWM1ZGt3WmZjeUg3ekd2cHdJREFRQUJBb0lCQVFDWlNsMXFLNXQ3WkVLaApsQVBRZ3lnV3R2U3F2QTE3dW8ydm1QVEFGbXpaWm5oTFJRYzBSUFN1Y0dCeW1WTTNVdS9teTVpSkNwQnJnS3lQClZNSGdQdVZnc0NhcHM1NGU5S0tCdDg1TGd3b0R1RnRaSmw4Mmp1NnNrbmV3enp5bzNwem1sNkpWOE5kOTExd28KTjl2YVJNWFE3NzErL3NLRHlhZDNKYitLSkVGU2s3dGVrYUZtd0plYXY1L1A0azJvMDFjRC9uT2tPd2NZanJGaQppMndxUlloTEZPV3ZpdXhvazc0TllDbytZT1kzVnRtUVFzK0lIOXNDdEtBS3dOdVVPQVREbm5SZUcxODBxNE1IClJNSlhhVkI5U2NFZnJOUUlqWkxrbU9KblVyaDBKdW42RTdaVWZlVEJPbWtyUUZzUnpZOFVyendOVUtadm1HUXcKM043elEycHhBb0dCQU85RXJ3THBxc0t4Wk4wQ2FsSTk4MlRGRDF3T0tWMTdOeWtZbVVYbG40VHNNL01DbzNxZAo2OEtlRG5NSk1KUlZJbFNPMTZtdUpZamh4L00yQ0xXVDgycW9lMVIzdURuSngydForVm1NcFUvaVV0WnJGQ0Z3CkdyZ002Y01NN0FiYVA5cXZyaS9OM0w1VGMzdHN5eHF4T3RSMGNiY1hLNmc5bGw3SVFWbzJaNkR0QW9HQkFPZkQKWTlNNFA3REk0WWdXcGxSdHo3bWZBTmZjQjlkK1l0Vkcvd0Y5K3hBYkxGNzBnYWp5NStOUUQwNkV2MUNGdTJkYwpOakE2R3A5ZTNaQUVsM1dZSzlwNkpXcDZmTjJlZEd6WFl4dWlwL1pLaFh4TndPTG5XU0hGeS8wbTA1NFVnL3NmCjZtQk1Tbk5MakgrV1Mvb2FVTW5NalhRTnlLL0VYbWdHbWpvTmJjUmpBb0dBYzB4NHZZR1diUDJJNmt1SXFwcjUKMTl4eUcxMGpwODJCZGtkSlRQcHJGV1d4WHZBdGtSL2FoVTBmRDJZbFI4V0NwcGF5N0N4a0lBTVZGR2s4Wnl1bAppQWNxYnpqRnlPc29NdDRIbjVSNzhQUFNFVXRHUnhxN2RXZWJtZ2QzRVpKSVpQeDFoc25BRHVNdDZoTXlDR25SClBLSGtUbU5YQVZxMFJIWmhyN1E1MmIwQ2dZQW9WOHE0cWNIc2RBdE9OU01lN3o2SmdUUVVYeWkrU2pIS0RtRzEKVU1pek5RZTBQN3VTUFRaQVMxOWV1NXpSMTNXWHVHVjJYNmJpdHhDNlVSSk1WZ0RNTnhic1FnWWFUY3JXWjJmSgpGN2RGR0JQRVg0U0QrdUY1RDRWQ3U4OTZaUGZVYnVuTmVYbzhONXB0V3l3K2pQWWpLb2cwKzNGRHAxc0hvSjZZCkhGNVBJUUtCZ1FDZ0NkbDhVN0ZpL04xVEVxQURtc2tKSFZkbEhaSWNTZ1ExNDFQdXB3SEQ0bWZDMlJ3TnJUY2IKOFNic3ExUy9BSkp1SkFaUWluWFhJOWhZWWtyOGZpSVUrQWt1ZVNILytYUTFjUmRzYkVGOVdrTnhLSTlLVmFYWgp4VWovc3JNd1U2RkVTNEJVN2dMRGxxODBIcFlxbGM2NDBtZHBZYm9ESS9wZVgybDhSeTFaYXc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
logLevel: INFO
runtime: runc
runk:
  namespace: ""
  dnsServers: []
  kubeconfigFile: ""
capacity:
  cpu: ""
  memory: ""
  pods: ""
  storage: ""
  ephemeralStorage: ""
reservedResources:
  cpu: ""
  memory: ""
image:
  pullPolicy: ""
  defaultRegistry: ""
  registries: []
datastoreEndpoint: ""

dataMesh:
  dataProxyList:
    - endpoint: "dataproxy-grpc:8023"
      dataSourceTypes:
        - "odps"

What happend and What you expected to happen.

生成的结果文件乱码

bash-5.2# cat var/storage/data/psi-output2.csv
ORC
P

@`P

P@F&�

GMT
"
P

@`P

x
0    ?("
           col2id1 ( ( (00P:
                         @`P:
                          P@�NHb1.9.m��"
                                        ($0��ORCbash-5.2#

任务详情如下
```yaml
bash-5.2# cat psi.yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: KusciaJob
metadata:
  name: job-best-psi-4
  namespace: cross-domain
spec:
  initiator: alice
  scheduleMode: BestEffort
  maxParallelism: 2
  tasks:
    - taskID: job-psi-4
      alias: job-psi-4
      priority: 100
      appImage: secretflow-image
      parties:
      - domainID: alice
      - domainID: bob
      taskInputConfig: '{
  "sf_datasource_config": {
    "alice": {
      "id": "default-data-source"
    },
    "bob": {
      "id": "default-data-source"
    }
  },
  "sf_cluster_desc": {
    "parties": [
      "alice",
      "bob"
    ],
    "devices": [
      {
        "name": "spu",
        "type": "spu",
        "parties": [
          "alice",
          "bob"
        ],
        "config": "{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"
      }
    ],
    "ray_fed_config": {
      "cross_silo_comm_backend": "brpc_link"
    }
  },
  "sf_node_eval_param": {
    "domain": "data_prep",
    "name": "psi",
    "version": "0.0.8",
    "attr_paths": ["input/input_table_1/key", "input/input_table_2/key", "protocol", "sort_result", "allow_empty_result", "allow_duplicate_keys", "allow_duplicate_keys/no/skip_duplicates_check", "allow_duplicate_keys/no/receiver_parties", "ecdh_curve"],
    "attrs": [{
            "is_na": false,
            "ss": ["id1"]
          }, {
            "is_na": false,
            "ss": ["id2"]
          }, {
            "is_na": false,
            "s": "PROTOCOL_RR22"
          }, {
            "b": true,
            "is_na": false
          }, {
            "is_na": true
          }, {
            "is_na": false,
            "s": "no"
          }, {
            "is_na": true
          }, {
            "is_na": false,
            "ss": ["alice"]
          }, {
            "is_na": false,
            "s": "CURVE_FOURQ"
          }]
  },
  "sf_input_ids": [
    "ncxlpgobbb", "yfonbphbea"
  ],
  "sf_output_ids": [
    "psi-output2"
  ],
  "sf_output_uris": [
    "psi-output2.csv"
  ]
}'

### Kuscia log output.

```shell
无
ruhengChen commented 4 days ago

升级了 secretflow 算法库,结果看起来像是 ORC 格式的,任务需要在哪里另外做配置吗?

zimu-yuxi commented 4 days ago

如何升级的算法库?使用的版本是多少呢?

ruhengChen commented 4 days ago

重新部署了最新版的kuscia 0.12.0b0, 用了 secretflow 1.10.0b0 版本

zimu-yuxi commented 4 days ago

方便给下default-data-source这个数据源的详细信息吗?

ruhengChen commented 4 days ago
bash-5.2# kubectl get domaindatasource -A
NAMESPACE   NAME                     AGE
alice       default-data-source      5h58m
alice       default-dp-data-source   5h58m
bash-5.2# kubectl get domaindatasource -n alice -o yaml
apiVersion: v1
items:
- apiVersion: kuscia.secretflow/v1alpha1
  kind: DomainDataSource
  metadata:
    creationTimestamp: "2024-11-25T02:37:16Z"
    generation: 1
    labels:
      kuscia.secretflow/domaindatasource-type: localfs
    name: default-data-source
    namespace: alice
    resourceVersion: "318"
    uid: 70e2ed50-cdd4-49d6-a02f-a780777c92ea
  spec:
    accessDirectly: true
    data:
      encryptedInfo: wdg/8zlhgL6EmEZkXhGiqw80VxW7Larxr5EcwlkxzQiS52M+HbwhSLJiOOtvfAvPCp2QjrVp3q1Y/pKgoe4ujA2W8SvpvSHBSm/PV2bBGGOrFFz5GCQXIffN9OP6ysTNFtt/RA9lo4K6C+1/bz6pIf0LreFLcLaOnIBDLBihndbTVhnKtTct1q0I9I6uFpy19QJ4BpZNb59zZDxydkRzX5TBdRKOZiM98UertpF3fYvai9sNSsis4OX3vd+q6QrEtj2J66wVLIoB9RhzTb4beAqcs7R9M2mb16VR6Rtr9dswVr/mwmFkiZj/eGtXz1s9+mgsNOBDSCICNac/9H0+Yg==
    name: default-data-source
    type: localfs
    uri: /home/kuscia/var/storage/data
- apiVersion: kuscia.secretflow/v1alpha1
  kind: DomainDataSource
  metadata:
    creationTimestamp: "2024-11-25T02:37:16Z"
    generation: 1
    labels:
      kuscia.secretflow/domaindatasource-type: localfs
    name: default-dp-data-source
    namespace: alice
    resourceVersion: "319"
    uid: 04abf892-db1c-443d-95a7-bc2f2713d760
  spec:
    data:
      encryptedInfo: gJcy1oo0XlSMkeW9ykZzzEVktiFywAmAIIETgyQ3uBrmH35VlgBaJ8e2NHx5WVJ4z1ihb3NXEXTatI4+/hELAaqMzvthhN7EQ/JspFa7twdevSYQBJjB7do4QQkFt5kSWk6aFqmtHuCf5gDInCy48RrtfDVk0jMXVyIWvUWyWBu8/2dw1EmllJ4XmB60e7s+iw2m5H7wbGt+pYAZIqXQ+6rrR4DqetIkDsRgEgCOFlnPUn/keVg35MwrXIOFRlR13cgDMKk01+3Z7D8dvbXukLlrGJ7YxFBGAwRyCcOunNZwAm5M6CG1U57ac8q0rh3OZPo8YQ7jTN37aEo8a4krhQ==
    name: default-dp-data-source
    type: localfs
    uri: /home/kuscia/var/storage/data
kind: List
metadata:
  resourceVersion: ""

这个是 kuscia 部署的时候,默认生成的,没有修改过

ruhengChen commented 4 days ago
bash-5.2# kubectl get domaindata zlhtdqvmnuolppwp-xihrjmenfioinlvp-node-3-output-0 -n alice -o yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: DomainData
metadata:
  annotations:
    kuscia.secretflow/initiator: zlhtdqvmnuolppwp-xihrjmenfioinlvp-node-3-output-0
  creationTimestamp: "2024-11-25T08:38:04Z"
  generation: 1
  labels:
    kuscia.secretflow/domaindata-type: table
    kuscia.secretflow/domaindata-vendor: secretflow
    kuscia.secretflow/interconn-protocol-type: kuscia
  name: zlhtdqvmnuolppwp-xihrjmenfioinlvp-node-3-output-0
  namespace: alice
  resourceVersion: "35945"
  uid: 4dc3d62a-f1c6-4f75-a1f5-f76ce27cb549
spec:
  attributes:
    dist_data: |-
      {
      "name": "zlhtdqvmnuolppwp-xihrjmenfioinlvp-node-3-output-0",
      "type": "sf.table.vertical_table",
      "meta": {
      "@type": "type.googleapis.com/secretflow.spec.v1.VerticalTable",
      "schemas": [
      {
      "ids": [
      "id1"
      ],
      "features": [
      "col2"
      ],
      "idTypes": [
      "int"
      ],
      "featureTypes": [
      "int"
      ],
      "labels": [],
      "labelTypes": []
      },
      {
      "ids": [
      "id2"
      ],
      "idTypes": [
      "int"
      ],
      "features": [],
      "labels": [],
      "featureTypes": [],
      "labelTypes": []
      }
      ],
      "lineCount": "3"
      },
      "dataRefs": [
      {
      "uri": "zlhtdqvmnuolppwp-xihrjmenfioinlvp-node-3-output-0",
      "party": "alice",
      "format": "orc",
      "nullStrs": []
      },
      {
      "uri": "zlhtdqvmnuolppwp-xihrjmenfioinlvp-node-3-output-0",
      "party": "bob",
      "format": "orc",
      "nullStrs": []
      }
      ]
      }
  author: alice
  columns:
  - comment: id
    name: id1
    type: int
  - comment: feature
    name: col2
    type: int
  dataSource: default-data-source
  fileFormat: unknown
  name: zlhtdqvmnuolppwp-xihrjmenfioinlvp-node-3-output-0
  relativeURI: zlhtdqvmnuolppwp-xihrjmenfioinlvp-node-3-output-0
  type: table
  vendor: secretflow

我这边执行任务,发现生成的 domaindata 确实是 orc 格式的,我可以调整 job yaml 让其生成 csv 格式吗

zimu-yuxi commented 4 days ago

你输入的domaindata的看下是什么格式

ruhengChen commented 4 days ago
bash-5.2# kubectl get domaindata ncxlpgobbb -n alice -o yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: DomainData
metadata:
  annotations:
    kuscia.secretflow/initiator: alice
  creationTimestamp: "2024-11-25T05:57:49Z"
  generation: 1
  labels:
    kuscia.secretflow/domaindata-type: table
    kuscia.secretflow/domaindata-vendor: manual
    kuscia.secretflow/interconn-protocol-type: kuscia
  name: ncxlpgobbb
  namespace: alice
  resourceVersion: "18911"
  uid: 6bd0c649-1f3e-44d3-bd24-7ef37398605d
spec:
  attributes:
    description: alice2aa
  author: alice
  columns:
  - comment: ""
    name: id1
    type: int
  - comment: ""
    name: col2
    type: int
  dataSource: default-data-source
  fileFormat: unknown
  name: alice2
  relativeURI: alice2_1264607611_1581171762.csv
  type: table
  vendor: manual

输入这里的是 unknown

zimu-yuxi commented 3 days ago

unknown

在创建domaindata时候,有些问题,将这里的unknown修改为csv,可以使用kubectl edit domaindata ncxlpgobbb -n alice,另一方也修改下,尝试重新执行job

ruhengChen commented 3 days ago

不大行,我设置为 csv 后,过一段时间会自动变为 unknown

zimu-yuxi commented 3 days ago

不大行,我设置为 csv 后,过一段时间会自动变为 unknown

建议删除现有的domaindata,重新创建,可以参考此处,目前验证了一下是没有问题的

zimu-yuxi commented 3 days ago

@ruhengChen 方便再提供下双方的任务日志吗?

ruhengChen commented 3 days ago

kuscia.log kuscia-bob.log

zimu-yuxi commented 3 days ago

任务执行日志,home/kuscia/var/stdout/pods/下找下任务id对应的日志

ruhengChen commented 3 days ago

0.log

zimu-yuxi commented 2 days ago

看了下日志,输出的格式是orc也是没问题的。方便locale看下吗?

ruhengChen commented 2 days ago

是这个么? 外部是 UTF-8

(base) [root@raycluster1 uiem]# locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

docker 内部是 POSIX

bash-5.2# locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
ruhengChen commented 2 days ago

我这里做了大量的测试 使用 all_in_one 的方式部署,根据以下操作,生成的没有问题的

创建domaindata,授权

apiVersion: kuscia.secretflow/v1alpha1
kind: DomainData
metadata:
  annotations:
    kuscia.secretflow/initiator: alice
  creationTimestamp: "2024-07-25T06:58:43Z"
  generation: 1
  labels:
    kuscia.secretflow/domaindata-type: table
    kuscia.secretflow/domaindata-vendor: manual
    kuscia.secretflow/interconn-protocol-type: kuscia
  name: alice-table
  namespace: alice
  resourceVersion: "1141502"
  uid: 5bf42f03-0fdc-4f88-9390-e2474b97c23e
spec:
  attributes:
    description: alice
  author: alice
  columns:
  - comment: ""
    name: id1
    type: int
  dataSource: default-data-source
  fileFormat: csv
  name: alice-table
  relativeURI: alice.csv
  type: table
  vendor: manual
curl -X POST 'https://127.0.0.1:8082/api/v1/domaindatagrant/create' --header "Token: $(cat /home/kuscia/var/certs/token)" --header 'Content-Type: application/json' -d '{
 "grant_domain": "bob",
 "description": {"domaindatagrant":"alice-bob"},
 "domain_id": "alice",
 "domaindata_id": "alice-table"
}' --cacert /home/kuscia/var/certs/ca.crt --cert /home/kuscia/var/certs/ca.crt --key /home/kuscia/var/certs/ca.key
apiVersion: kuscia.secretflow/v1alpha1
kind: DomainData
metadata:
  annotations:
    kuscia.secretflow/initiator: bob
  creationTimestamp: "2024-09-20T06:20:24Z"
  generation: 1
  labels:
    kuscia.secretflow/domaindata-type: table
    kuscia.secretflow/domaindata-vendor: manual
    kuscia.secretflow/interconn-protocol-type: kuscia
  name: bob-table
  namespace: bob
  resourceVersion: "9340533"
  uid: 8bb0a0d8-007b-4026-aa86-a6adc8b4928f
spec:
  attributes:
    description: bob
  author: bob
  columns:
  - comment: ""
    name: "id2"
    type: int
  dataSource: default-data-source
  fileFormat: csv
  name: bob-table
  relativeURI: bob.csv
  type: table
  vendor: manual
curl -X POST 'https://127.0.0.1:8082/api/v1/domaindatagrant/create' --header "Token: $(cat /home/kuscia/var/certs/token)" --header 'Content-Type: application/json' -d '{
 "grant_domain": "alice",
 "description": {"domaindatagrant":"bob-alice"},
 "domain_id": "bob",
 "domaindata_id": "bob-table"
}' --cacert /home/kuscia/var/certs/ca.crt --cert /home/kuscia/var/certs/ca.crt --key /home/kuscia/var/certs/ca.key

创建任务

apiVersion: kuscia.secretflow/v1alpha1
kind: KusciaJob
metadata:
  name: job-best-psi-4
  namespace: cross-domain
spec:
  initiator: alice
  scheduleMode: BestEffort
  maxParallelism: 2
  tasks:
    - taskID: job-psi-4
      alias: job-psi-4
      priority: 100
      appImage: secretflow-image
      parties:
      - domainID: alice
      - domainID: bob
      taskInputConfig: '{
  "sf_datasource_config": {
    "alice": {
      "id": "default-data-source"
    },
    "bob": {
      "id": "default-data-source"
    }
  },
  "sf_cluster_desc": {
    "parties": [
      "alice",
      "bob"
    ],
    "devices": [
      {
        "name": "spu",
        "type": "spu",
        "parties": [
          "alice",
          "bob"
        ],
        "config": "{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"
      }
    ],
    "ray_fed_config": {
      "cross_silo_comm_backend": "brpc_link"
    }
  },
  "sf_node_eval_param": {
    "domain": "data_prep",
    "name": "psi",
    "version": "0.0.8",
    "attr_paths": ["input/input_table_1/key", "input/input_table_2/key", "protocol", "sort_result", "allow_empty_result", "allow_duplicate_keys", "allow_duplicate_keys/no/skip_duplicates_check", "ecdh_curve"],
    "attrs": [{
            "is_na": false,
            "ss": ["id1"]
          }, {
            "is_na": false,
            "ss": ["id2"]
          }, {
            "is_na": false,
            "s": "PROTOCOL_RR22"
          }, {
            "b": true,
            "is_na": false
          }, {
            "is_na": true
          }, {
            "is_na": false,
            "s": "no"
          }, {
            "is_na": true
          }, {
            "is_na": false,
            "s": "CURVE_FOURQ"
          }]
  },
  "sf_input_ids": [
    "alice-table", "bob-table"
  ],
  "sf_output_ids": [
    "psi-output5"
  ],
  "sf_output_uris": [
    "psi-output5.csv"
  ]
}'

生成的 domindata 信息是正确的

bash-5.2# kubectl get domaindata psi-output5 -o yaml -n alice
apiVersion: kuscia.secretflow/v1alpha1
kind: DomainData
metadata:
  creationTimestamp: "2024-11-27T06:56:33Z"
  generation: 2
  labels:
    kuscia.secretflow/domaindata-type: table
    kuscia.secretflow/domaindata-vendor: secretflow
    kuscia.secretflow/interconn-protocol-type: kuscia
  name: psi-output5
  namespace: alice
  resourceVersion: "145734"
  uid: d1629af2-6ef5-4dc0-8295-0f50be6efed7
spec:
  attributes:
    dist_data: |-
      {
      "name": "psi-output5.csv",
      "type": "sf.table.vertical_table",
      "meta": {
      "@type": "type.googleapis.com/secretflow.spec.v1.VerticalTable",
      "schemas": [
      {
      "ids": [
      "id1"
      ],
      "idTypes": [
      "int"
      ],
      "features": [],
      "labels": [],
      "featureTypes": [],
      "labelTypes": []
      },
      {
      "ids": [
      "id2"
      ],
      "idTypes": [
      "int"
      ],
      "features": [],
      "labels": [],
      "featureTypes": [],
      "labelTypes": []
      }
      ],
      "lineCount": "3"
      },
      "dataRefs": [
      {
      "uri": "psi-output5.csv",
      "party": "alice",
      "format": "orc",
      "nullStrs": []
      },
      {
      "uri": "psi-output5.csv",
      "party": "bob",
      "format": "orc",
      "nullStrs": []
      }
      ]
      }
  author: alice
  columns:
  - comment: id
    name: id1
    type: int
  dataSource: default-data-source
  fileFormat: csv
  name: psi-output5.csv
  relativeURI: psi-output5.csv
  type: table
  vendor: secretflow

但是根据下面的方式部署,生成的就是 unknown

export KUSCIA_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia:0.12.0b0
export SECRETFLOW_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/secretflow-lite-anolis8:1.10.0b0
export DATAPROXY_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/dataproxy:0.2.0b0

docker pull $KUSCIA_IMAGE && docker run --rm $KUSCIA_IMAGE cat /home/kuscia/scripts/deploy/kuscia.sh > kuscia.sh && chmod u+x kuscia.sh

docker pull $SECRETFLOW_IMAGE
docker pull $DATAPROXY_IMAGE

# --domain 参数传递的是节点 ID
export DOMAIN_ID=alice
export DOMAIN_ID=bob
docker run -it --rm ${KUSCIA_IMAGE} kuscia init --mode autonomy --domain "${DOMAIN_ID}" > autonomy_${DOMAIN_ID}.yaml

./kuscia.sh start -c autonomy_${DOMAIN_ID}.yaml -p 1080 -k 1081 --data-proxy
bash-5.2# kubectl get domaindata psi-output5 -n alice -o yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: DomainData
metadata:
  annotations:
    kuscia.secretflow/initiator: psi-output5
  creationTimestamp: "2024-11-27T08:02:50Z"
  generation: 1
  labels:
    kuscia.secretflow/domaindata-type: table
    kuscia.secretflow/domaindata-vendor: secretflow
    kuscia.secretflow/interconn-protocol-type: kuscia
  name: psi-output5
  namespace: alice
  resourceVersion: "3637"
  uid: 046cebd6-7a32-4af8-8fc7-d5d6d07e48ab
spec:
  attributes:
    dist_data: |-
      {
      "name": "psi-output5.csv",
      "type": "sf.table.vertical_table",
      "meta": {
      "@type": "type.googleapis.com/secretflow.spec.v1.VerticalTable",
      "schemas": [
      {
      "ids": [
      "id1"
      ],
      "idTypes": [
      "int"
      ],
      "features": [],
      "labels": [],
      "featureTypes": [],
      "labelTypes": []
      },
      {
      "ids": [
      "id2"
      ],
      "idTypes": [
      "int"
      ],
      "features": [],
      "labels": [],
      "featureTypes": [],
      "labelTypes": []
      }
      ],
      "lineCount": "2"
      },
      "dataRefs": [
      {
      "uri": "psi-output5.csv",
      "party": "alice",
      "format": "orc",
      "nullStrs": []
      },
      {
      "uri": "psi-output5.csv",
      "party": "bob",
      "format": "orc",
      "nullStrs": []
      }
      ]
      }
  author: alice
  columns:
  - comment: id
    name: id1
    type: int
  dataSource: default-data-source
  fileFormat: unknown
  name: psi-output5.csv
  relativeURI: psi-output5.csv
  type: table
  vendor: secretflow
ruhengChen commented 2 days ago

创建 domaindata 、授权、创建 job 步骤都是完全一致的,帮忙看一下是什么问题

zimu-yuxi commented 1 day ago

创建 domaindata 、授权、创建 job 步骤都是完全一致的,帮忙看一下是什么问题

是的,我们也复现出来了,我们内部优先看下