secretflow / kuscia

Kuscia(Kubernetes-based Secure Collaborative InfrA) is a K8s-based privacy-preserving computing task orchestration framework.
https://www.secretflow.org.cn/docs/kuscia/latest/zh-Hans
Apache License 2.0
73 stars 55 forks source link

docker部署 runp模式,模型导出model_export组件执行时间变长问题 #440

Closed 14ctt closed 5 days ago

14ctt commented 1 month ago

Issue Type

Others

Search for existing issues similar to yours

No

Kuscia Version

kuscia 0.10.0b0

Link to Relevant Documentation

No response

Question Details

遇到的问题:
 更换服务器部署kuscia后模型导出组件的执行时间变长。

预期行为和实际行为之间的差异:
 换服务器之前执行时间在一分钟以内,更换服务器部署kuscia之后执行时间变成了2分钟左右。

版本
kuscia 0.10.0b0;secretFlow 1.8.0b0
14ctt commented 1 month ago

模型导出组件日志 alice_model-export-1728378156596-partner-0_ff28d007-e60c-4e48-998e-68011f6c4696.txt bob_model-export-1728378156596-partner-0_d3bd7aaf-85bc-414a-92b4-0e08c7083e8a.txt

建模job配置: 建模job配置.txt

模型导出job配置: 模型导出任务配置.txt

14ctt commented 1 month ago

建模数据 alice方: binary_hetero_guest_train.csv bob方: binary_hetero_host_train.csv

14ctt commented 1 month ago

bob kuscia.yaml配置: mode: autonomy domainID: bob domainKeyData: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBN2diTVI3ZkIrVGpINGZ0dDE0U0xldzFWdCt4S1NZeTZXc0JlNDh6OTBvTllSRGE3CkZObjhpbGo5eVNUOEMxM282aU5TWElqZlRORlpLOG9TWTBDaURtNWJkV1lKKytEZjJiTzF0OWQvNHR5RlVyMmoKM0lTUX VndnU3V2NmOXBnRmIyYW1LNzJsU01NVTNFNXo5UGZNZmh4RHFoS3NwRWh5cXB4M2l0ay8vSWFxdHBwdwpFY28zVWN3L0oyRGJ4TlkxUlhpNjUrb2ZrSnZ2aTNKYjVlN3BnNEd4T2s4L0hLZ3dSRUcwY0NoY1VKWnJYMk90CjJVMXBiM3pwSDlHd2czVjlNSXpkUFgwZVVpeSt2NHdVVDdUL3JwRDVPbm1jclVXU0Y5NEd QdlRCRThLbUNPRzQKc1lvaTFHQlNPVkIrWkJvcHV5TkNLTUtlVUdnck9uR25aaDVjbVFJREFRQUJBb0lCQUJWRXhqd2lUMUxWb0FheApVTDlST0tWbUs2S2ptbiszODI3b2daSVpYeFRRWjdLd3NVYkxzNlVwRG9SS3FicmFTVldpa2xseVlZR2JzendXCmNrNHNHRTd4elhwUUE5UmQ0Nkt3VEN5Qk5Rd3dhRGxjZUtm YnNKTHZQZ280ak1LY0V1QkJhOWU4dW1nbUJsY3AKSzNOWU1RMlR1UGdaY3hPNjVZUVpOeFZSRkVhY0ZjTVBJQndqUEZoNXRRNHZsa3hwamtua0lsM0hhWXN4MURDawpoL284QlU1Vy9qREkvNzZVVFB3Vks4ZTB0aWNlbVlmdTZlOHJ3a1JoelFCazhJMkZ6WHlnV3lTcnZVaCtvaUt2Cmppc1VzWUZBMXgwOC9id0NnR EQ5S1BNZjM1S3RBUXc5dnp6bm5qa3UwY08wdVlqaUxyRlp0Si9KNWR5bU10L3cKZWpoSFZZRUNnWUVBNy9HbE0vSFlyTWwzZXBqQ0ZjL0RMTFZmSVRCNW5VLzBrOE5JOG5jZUUxbU85WEFaRjM3MAowaW82Ykd5OWtldkZsMXBvS2pvYmtZMHBSNjZuZW9EUnBRL1RqWEMzMStYeVJDMmdRNGVWZTVNNVVoWlNnK3F0Ck prbFdhUFNrSWZ2ZDc4UW90TTNaYUVnR3pNRi9mQmxJRUhKczljY0EzLzhMazNFVE54NkVpeThDZ1lFQS9mUk8Kb0ZxTkpmK3NUclF3RUVZM292M083Q3NmRFZSdzAvclhNdTN4Q0JKVUcvbDF5Q2kzRHd2eDlEY252SlpFNkxlcwpaTzM3N0NsK1BKL3hnNHJCY0lDa3NCck8rMlZwazdGU21ROVhueUFpMC8yQTJMZGs wOTl1c3ZYK3F0OWpKbm52CkVoeSs3a1hWaTZjWWVVM09SVEgyWXd4ODh6Q28ycXB5UjFRZmdyY0NnWUVBbjJwM0Rqb0NjVm94VFh5c2huMVEKK1NWUG5PZHVCWHlYekl5VXJMTkRnaXVnZGFVU3BxK3N6TzFOZjdnSGd5bVlUK1M3REVNckNkczFyL0IwU1VuSQovOWV5QUdrQlhDbmtlak91Sk8reCt6Wm5nUWhmcmxG KzFNOG8zL2FhRGhJZTlDdnB0NmFFYVdwaFNpek5ISkJYCmRzWHZhcVBiQ3ZlU28xVWZKU0hiZ2ZVQ2dZRUE2L2Q0bzNqZ0ZFRzMrajFsTjRuM0ltVFJReEppUFFHTUFQdmYKdjd2bHJZbFNTOWdFZitLTkRkY2NqNzQ4RnZoTnVTc0piWkxQOVVVc3Zlb2tBQVVLN3AwcDZWUlBwTG83V25kMgpkekFSUzVNUnFZYXdsU zM2NnY5K0haMkRiNFEzM2NXSXhYeWVVRW5sUFdrYXpZUGxPTmhPeHdRUWlldFp5aXg4CkdSanFXb3NDZ1lBK1B4cC92a1NsQzNaUG92cmJ5OEFnaVRIY2RlK0g3WmRkSUxkU1M0cWR3ZEJvRlEyWjRSR1QKd1BLZWJwdkVlWWJQL2VNcVc5YUhEUElQYWNNUVdKYklrZGdJazZ0VmYyM0JsTFFWSDhOQ3VjV0FCdzJpbj B5aApqOEY2RHN1VmFQSEtIU2o5Zjg5d3Q1ZnFLRUh5VUtEOVc1Mm80TENmNlNFV05Mdko4ZEQycmc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo= logLevel: INFO runtime: runp runk: namespace: "" dnsServers: [] kubeconfigFile: "" capacity: cpu: "" memory: "" pods: "" storage: "" reservedResources: cpu: "" memory: "" image: pullPolicy: "" defaultRegistry: "" registries: [] datastoreEndpoint: ""

alice kuscia.yaml配置 mode: autonomy domainID: alice domainKeyData: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcFFJQkFBS0NBUUVBenZvM1VnU3BoYkxCQUd3NWxhT3l0RE5FelZOaGE5RklMVjlmVUl1ZWZVMExtZ05xCnJaTE1nWnRLaXIySnJsMmRIUmNpaElDS3lVWTQ2TEVKcndaN1pYcVkxYTgxYWhrT2FmNjNldVltWXdUQVd5aHUKR2c1aW tTcHhDV2dLZzgvamZkaE5BREM0QVczbCtuaGUzVm8zOUNJcTJkOWNqQktINjRRaWtGMDVQY2xGb1JlYgpqT2EyNXIyMmNoMGR6ZXZOUzgzN1BCajVUMXJCNnkrRDZHTDF5QzNFZDl1SlFibURkRVMvSWlESEJsbnJ1akZlCmJuMEdpc3c3WTdFTXZ4TWIwdnkvc3h0TXRMb1M2ZDRnVUFXOCt3Z2ptWWFhUzJkcTJRN1Z 4Z1gwZ1djb0xwV2MKNk50dytBL1lzTUgvRUdWVkxYN0U4M0EvMmpUUUo5bnYxa1FoelFJREFRQUJBb0lCQVFER244QnU2U0oxdDNFWApvc1F3SWdQZ2drTElkL2ZKcS9FRCtiNVNZV29hL05EaFg5NEQ3Qmh0V0VWVWwwZUZHVGtwTFlabWVhUzJPcmxxCk5Gc0NwOG5MUExkbWVObGRrK1lOT2UrQjlWTnBPcW9OME9L NEJvMzRtUzRZeE1zeWExSDQ5cFlPVkxLQVRLS2kKbHRrU3V2MXZQMHE0QW0yaHpqYTVnUGhhYWR3QXVXZDMreWN6VWlLVXVVK1dxZXl6bXREYkhRb25FSjdoNlJIbApZeEJ6RnBKTmhsM3FQQ1YveGZjSGpIZ3Q2Q24rcGNsMit0ejNrTUh1R1FrQTVDQ2h4cXdJTWsvWDl5b29oblI3Cmw4SU55YVpKSUZNVHJ2UFNvR U1SN0xJZDFQK1Q0M1dpd1d2elRIeE96c0VoNGVWTUlIT2VVazViQUFrdzdrQjQKbmczcjduSlJBb0dCQVBVZWFpZkdtMzZNU1VkdkNpRVNiTlV2S3lwNnI1eEU2WVJuVlh2MTRZaGtPRmNnSkNlMApuY2ZtVmhQVmx0bkpuQkVBd2FlQUlFVXhqUEk0VllpL3ZOV2dLbVJUWUM3N1oxWEw4amVqL253M1M3RGZnZ0V3Cj N6LzFqdjRqQllITlgxMFpjZjJJMm5yM00reFY4YTlZQitmRmt1WjU5cWxZR0xiUVBZNy93K2pyQW9HQkFOZ3EKV3AwRjFpbDF4cmlTMm5GZW5CeWorc1BYODJDRHlwaVYvRnl2aXN6TlNKQ08wblY3QlhiQjM2QXppM0Q2a3F5awp2YVBPYVFhVmRqbDdhUytxY2pUaEU3NnErVGtMMGl4RDB5TXV5aEM4NzVyZ3NnamZ VSG9DZkJqcXNjalkyUUpkCjZCNFlEV2JqZFROODJNZ0V6UjlDQm8xaXE2dVN0QWtwb3diNFdYSW5Bb0dCQUowOWNvbXJqU29qNGdveUFBUngKSm1HblRZQ0ZqVXVvcFVnclpab01oQzRUWkZUM3FGblVNbmIrbW05RXorMUx1Skxyc2s2NkVYbHhyT0hoSTNXQwowaWNVQkwyeEFuMkJCcXZ5RmFKOTBBMXRCMkFpTU9x RXFHSUdLMEY1dzltZG5qUkIwMjc1c0hXN1NKS3VHMGtKCkpxRVdqQUxQY0Z6M3gzcldvUGF2dWNRYkFvR0FFLzNPeFlqZHBwWHkyKzBROEwwc09PWGRjakZObTlaMGJTb3AKM2JTS1pLaHpscEx1MmRpWHg4VGtWcCtOdk5RZnJvSEozQlZoMXk4SmNRVjkwOSt5a2p0aXBSYVExL3JFNUQwZAprTHBxQzZROWtKaEpQd ExRVVloYlM3NmcxSTloZW51TzFRR3FjNktEbTFCbmlCQVJuRnR4MmErUGF0MjFjc2Q4CkFWQUVqYThDZ1lFQXJFT0NPblNJUGhkaHlrTThabFRjajNyRk45TkhXK1hUenl5ZXVyZTdlWG9XTEszelpyZXMKK2NPaW14eXE4dGZCTXRFR2hCbnVvR2t6WEZZL3NqbWJiRWJMeGE5OUtQVEFsOU5hbjVjTUR5b3RtSkNRZX hZSQpNREVycGplaGVBL0hOUUlmMEw3TnFXdjF2bnV6c1RaaGt2blZySXJPWlNLUEdnWk96M0JUN2dzPQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo= logLevel: INFO runtime: runp runk: namespace: "" dnsServers: [] kubeconfigFile: "" capacity: cpu: "" memory: "" pods: "" storage: "" reservedResources: cpu: "" memory: "" image: pullPolicy: "" defaultRegistry: "" registries: [] datastoreEndpoint: ""

BrainWH commented 1 month ago

你好,更换服务器前后的机器配置和网络配置一致吗?

14ctt commented 1 month ago

你好,更换服务器前后的机器配置和网络配置一致吗?

机器配置不一样了,网络都是通的。

网络: 更换前 alice:172.32.173.1 bob:172.32.173.4 更换后 alice:172.30.185.219 bob:172.30.185.217

alice服务器更换之前 alice更换之前服务器 alice服务器更换之后 alice更换之后服务器

bob服务器更换之前 bob更换之前服务器 bob服务器更换之后 bob更换之后服务器

BrainWH commented 1 month ago

需要两次执行的网络环境是一致的,可以设置相同的带宽和延迟进行测试

14ctt commented 1 month ago

需要两次执行的网络环境是一致的,可以设置相同的带宽和延迟进行测试

两次执行的网络环境是一致的

BrainWH commented 1 month ago

可以分别多跑几次看一下结果呢?

14ctt commented 1 month ago

可以分别多跑几次看一下结果呢?

执行了5次左右结果都差不多,在1分40几秒

github-actions[bot] commented 1 week ago

Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.