swanchain / go-computing-provider

A golang implementation of computing provider
MIT License
22 stars 22 forks source link

ubi task not starting #9

Closed ThomasBlock closed 7 months ago

ThomasBlock commented 8 months ago

I managed to get computing-provider running. But the ubi tasks are not starting. What might be missing?

My node setup: Swan1 = go-computing-provider, public ip address, kubernetes, no gpu Swan2 = kubernetes, gpu A4000 Swan3 = kubernetes, gpu RTX 4060TI ( not in official support list? )

computing-provider ubi-task list 
TASK ID TASK TYPE   ZK TYPE     TRANSACTION HASH    STATUS  REWARD  CREATE TIME         
42      CPU         fil-c2-512M                     running 0.0     2024-01-29 07:23:26 
61      CPU         fil-c2-512M                     running 0.0     2024-01-29 20:04:07 
66      CPU         fil-c2-512M                     running 0.0     2024-01-29 22:04:08 
69      CPU         fil-c2-512M                     running 0.0     2024-01-30 00:04:07 
72      CPU         fil-c2-512M                     running 0.0     2024-01-30 02:04:07 
75      CPU         fil-c2-512M                     running 0.0     2024-01-30 04:04:07 
78      CPU         fil-c2-512M                     running 0.0     2024-01-30 06:04:07 
81      CPU         fil-c2-512M                     running 0.0     2024-01-30 08:04:07 
84      CPU         fil-c2-512M                     running 0.0     2024-01-30 10:04:07

one idea might be that i have too many cpu spaces - so what i dont have enough resources to do ubi task. but i checked this and also killed some tasks:

computing-provider task list
TASK UUID       TASK TYPE   WALLET ADDRESS  SPACE UUID      SPACE NAME          STATUS  
...4175204b63   CPU         0x273...e40F3   ...7e1249bd88   Finder              Running 
...12ac3e540d   CPU         0xB87...E405f   ...adc3915681   NothnqDego          Running 
...93e2992f27   CPU         0x195...554d1   ...3b6016cfed   pac-manTR           Running 
...0472b892e9   CPU         0x195...554d1   ...27c6305c31   Protext             Running 
...66a59f77cc   CPU         0x78A...579f3   ...9499b09b96   Scorpius97-Pacman   Running 
...e1eebd130d   CPU         0x195...554d1   ...2c1157e6a0   Protektoria1        Running 
...14d33dd1c1   CPU         0x0Ad...54229   ...360a898ee2   Justinhulu          Running 
...9cf95fb8bb   CPU         0x8F3...bd69f   ...c53221a17f   Kikisweet           Running 
...ebae28699e   CPU         0x273...e40F3   ...a886138565   shape               Running 
...71b5077678   CPU         0x05E...d3C31   ...9435a5f8c7   SvoRa               Running 
...d9aafe3005   CPU         0x7BD...431B8   ...bc703ce7d8   bunalisebastian     Running 
...99e344fcfc   CPU         0x0Ad...54229   ...0169c4c478   Robbie122           Running 

here my config

cat ~/cp/config.toml 
[API]
UbiTask = true 
Port = 8085
MultiAddress = "/ip4/XXX/tcp/8085" 
Domain = "XXX"
NodeName = "XXX"
RedisUrl = "redis://127.0.0.1:6379"
RedisPassword = ""
[UBI]
UbiTask = true 
UbiEnginePk = "0xB5aeb540B4895cd024c1625E146684940A849ED9"
UbiUrl ="https://ubi-task.swanchain.io/v1"  
[LOG]
CrtFile = "/home/user/ssl/server.crt"
KeyFile = "/home/user/ssl/server.key"
[HUB]
ServerUrl = "https://orchestrator-api.swanchain.io"
AccessToken = "XXX"
WalletAddress = "XXX"
BalanceThreshold= 0.3
[MCS]
ApiKey = "XXX"
BucketName = "XXX"
Network = "polygon.mumbai" 
FileCachePath = "/tmp"
[Registry]
ServerAddress = "192.168.128.71:5000" 
UserName = ""
Password = ""
[RPC]
SWAN_TESTNET ="https://saturn-rpc.swanchain.io"    
SWAN_MAINNET= ""                                   
[CONTRACT]
SWAN_CONTRACT="0x91B25A65b295F0405552A4bbB77879ab5e38166c"
SWAN_COLLATERAL_CONTRACT="0xB8D9744b46C1ABbd02D62a7eebF193d83965ba39" 
cat fil-c2.env 
FIL_PROOFS_PARAMETER_CACHE="/var/tmp/filecoin-proof-parameters"
RUST_GPU_TOOLS_CUSTOM_GPU="NVIDIA RTX A4000:6144" #Shading Units

ls /var/tmp/filecoin-proof-parameters
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk
...

here the logs regarding ubi:

[GIN] 2024/01/30 - 10:04:07 | 200 |  312.427145ms |   38.104.153.43 | GET      "/api/v1/computing/cp"
time="2024-01-30 10:04:07.711" level=info msg="receive ubi task received: {ID:84 Name:1000-0-7-170 Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/QmVnWcUMh4X8YG86BJqd7icf6bEdqAtrU48eMk2XuVH3jo Signature:0xe19f3fb145e581f6f169b5176292a7c8f394a1ea996cc8c2f1f0e8bb13f7556673dd9b06d8b7b3788bd6e69cb9b99692833c55f92
0d765e941cf50465321808c01 Resource:0xc000d5c880}" func=DoUbiTask file="cp_service.go:546"
time="2024-01-30 10:04:07.711" level=info msg="ubi task sign verifing, task_id: 84,  type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:585"
time="2024-01-30 10:04:08.253" level=error msg="nodeName: %s, error: %+vswan1invalid character '.' looking for beginning of value" func=GetNodeGpuSummary file="k8s_service.go:527"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1257"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: remainderCpu: -2, remainderMemory: 0.00, remainderStorage: 293.00" func=checkResourceAvailableForUbi file="cp_service.go:1258"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1257"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: remainderCpu: 1, remainderMemory: 43.00, remainderStorage: 1413.00" func=checkResourceAvailableForUbi file="cp_service.go:1258"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1257"
time="2024-01-30 10:04:08.253" level=info msg="checkResourceAvailableForUbi: remainderCpu: 5, remainderMemory: 47.00, remainderStorage: 1413.00" func=checkResourceAvailableForUbi file="cp_service.go:1258"
time="2024-01-30 10:04:08.253" level=info msg="gpuName: NVIDIA-A4000, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan2:map[NVIDIA-A4000:1] swan3:map[NVIDIA-4060-Ti:1]]" func=checkResourceAvailableForUbi file="cp_service.go:1269"
time="2024-01-30 10:04:08.254" level=warning msg="ubi task id: 84, type: GPU, not found a resources available" func=DoUbiTask file="cp_service.go:639"
[GIN-debug] [WARNING] Headers were already written. Wanted to override status code 500 with 200
[GIN] 2024/01/30 - 10:04:08 | 500 |  543.238653ms |   38.104.153.43 | POST     "/api/v1/computing/cp/ubi"

There are no pods created for ubi

kubectl get po -A
NAMESPACE                                       NAME                                                           READY   STATUS             RESTARTS          AGE
ingress-nginx                                   ingress-nginx-admission-create-mh72w                           0/1     Completed          0                 5d23h
ingress-nginx                                   ingress-nginx-admission-patch-2d9rc                            0/1     Completed          0                 5d23h
ingress-nginx                                   ingress-nginx-controller-7fcc98f6bc-5phzb                      1/1     Running            1 (3d1h ago)      5d23h
kube-system                                     calico-kube-controllers-74d5f9d7bb-p2t7w                       1/1     Running            2 (33h ago)       6d
kube-system                                     calico-node-5cns9                                              1/1     Running            4 (16h ago)       23h
kube-system                                     calico-node-99rh4                                              1/1     Running            5 (16h ago)       6d
kube-system                                     calico-node-fbxhw                                              1/1     Running            1 (3d1h ago)      6d
kube-system                                     coredns-5dd5756b68-bvj46                                       1/1     Running            1 (3d1h ago)      6d1h
kube-system                                     coredns-5dd5756b68-sss5q                                       1/1     Running            1 (3d1h ago)      6d1h
kube-system                                     etcd-swan1                                                     1/1     Running            3 (3d1h ago)      6d1h
kube-system                                     kube-apiserver-swan1                                           1/1     Running            4 (33h ago)       6d1h
kube-system                                     kube-controller-manager-swan1                                  1/1     Running            12 (5m3s ago)     6d1h
kube-system                                     kube-proxy-lq975                                               1/1     Running            3 (3d1h ago)      6d1h
kube-system                                     kube-proxy-nttzs                                               1/1     Running            6 (16h ago)       6d
kube-system                                     kube-proxy-pz5g8                                               1/1     Running            4 (16h ago)       23h
kube-system                                     kube-scheduler-swan1                                           1/1     Running            13 (5m2s ago)     6d1h
kube-system                                     nvidia-device-plugin-daemonset-2wgbt                           1/1     Running            1 (3d1h ago)      6d
kube-system                                     nvidia-device-plugin-daemonset-df6lz                           1/1     Running            7 (17h ago)       23h
kube-system                                     nvidia-device-plugin-daemonset-shzdv                           1/1     Running            4 (16h ago)       5d23h
kube-system                                     resource-exporter-ds-fs5b5                                     1/1     Running            4 (16h ago)       4d23h
kube-system                                     resource-exporter-ds-mzgn7                                     0/1     CrashLoopBackOff   553 (4m54s ago)   46h
kube-system                                     resource-exporter-ds-whwkj                                     1/1     Running            92 (17h ago)      23h
ns-0x05eecd336633a443a5679e47797374fbb4cd3c31   deploy-fe29213a-f795-43d4-a7f6-6f9435a5f8c7-585c5f6854-tk84l   1/1     Running            0                 11h
ns-0x0ad8a3fdd123ef21ccccb6433bc555f67f154229   deploy-58b4b3cd-bcd7-4079-8b3c-7e360a898ee2-569f8d94c4-czn9m   1/1     Running            0                 16h
ns-0x0ad8a3fdd123ef21ccccb6433bc555f67f154229   deploy-74a62385-7b95-4d6d-9054-830169c4c478-7b9d7f6cd-bcgmw    1/1     Running            0                 16h
ns-0x195de990f6c8930194dd62ac21ceee04cf8554d1   deploy-0c103c54-e988-4112-afa8-933b6016cfed-7c5f959474-l5m5z   1/1     Running            0                 16h
ns-0x195de990f6c8930194dd62ac21ceee04cf8554d1   deploy-18a93794-f3a1-448b-ad3d-0927c6305c31-57c59c6856-vtv6d   1/1     Running            0                 16h
ns-0x195de990f6c8930194dd62ac21ceee04cf8554d1   deploy-3b84bed5-d2a2-400f-bd5b-972c1157e6a0-7c9cfb46d4-r89dr   1/1     Running            0                 16h
ns-0x2733c8521c1b80939415bf521775769cdabe40f3   deploy-70f77042-33e6-4e55-a3d6-ca7e1249bd88-69f5c56558-9n6ks   1/1     Running            0                 10h
ns-0x2733c8521c1b80939415bf521775769cdabe40f3   deploy-ed283275-cbfd-4248-a101-59a886138565-784856b97b-gn88p   1/1     Running            0                 10h
ns-0x3aa50e86b3ac589bf3a9b9d3f90bb6801611e8ed   deploy-994339f1-ff37-453c-8234-5df2f8e732a3-7d8999c64b-fgmcd   1/1     Running            0                 5m53s
ns-0x78a170be72f00f8a49538bc9895377984a8579f3   deploy-bd3691ee-c404-4973-8326-919499b09b96-7b4f6cb7cb-9flhd   1/1     Running            0                 51m
ns-0x7bdd1675943d8980facd61bb1253789a806431b8   deploy-96ea9115-4ac6-477e-984e-80bc703ce7d8-55f689c96c-88cfs   1/1     Running            0                 15h
ns-0x8f3d04858ba5da1f18500be92ce74fb2a61bd69f   deploy-5a601a1d-dacf-4b42-bd64-98c53221a17f-867c49d6f7-hpp8t   1/1     Running            0                 15h
ns-0xb87a6b7ed42a331cc4ba85df42063668cdfe405f   deploy-e2ba5400-1983-4062-a429-86adc3915681-68dffd47db-8w7ph   1/1     Running            0                 16h
tigera-operator                                 tigera-operator-94d7f7696-kgx6q                                1/1     Running            28 (5m3s ago)     6d
Normalnoise commented 8 months ago

can you provide file list in /var/tmp/filecoin-proof-parameters?

ThomasBlock commented 8 months ago

can you provide file list in /var/tmp/filecoin-proof-parameters?

yes - i downloaded it with filecoin is that okay? ( see https://github.com/swanchain/ubi-benchmark/issues/1 )

ls /var/tmp/filecoin-proof-parameters
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-92180959e1918d26350b8e6cfe217bbdd0a2d8de51ebec269078b364b715ad63.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-fb9e095bebdd77511c0269b967b4d87ba8b8a525edaa0e165de23ba454510194.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-2-102e1444a7e9a97ebf1e3d6855dcc77e66c011ea66f936d9b2c508f87f2f83a7.vk
v28-fil-inner-product-v1.srs
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0170db1f394b35d995252228ee359194b13199d259380541dc529fb0099096b0.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-50c7368dea9593ed0989e70974d28024efa9d156d585b7eea1be22b2e753f331.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-5294475db5237a2e83c3e52fd6c2b03859a1831d45ed08c4f35dbf9a803165a9.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-2627e4006b67f99cef990c0a47d5426cb7ab0a0ad58fc1061547bf2d28b09def.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-b62098629d07946e9028127e70295ed996fe3ed25b0f9f88eb610a0ab4385a3c.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-032d3138d22506ec0082ed72b2dcba18df18477904e35bafee82b3793b06832f.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-ecd683648512ab1765faa2a5f14bab48f676e633467f0aa8aad4b55dcb0652bb.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.params
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-2-sha256_hasher-96f1b4a04c5c51e4759bbf224bbc2ef5a42c7100f16ec0637123f16a845ddfb2.vk
Normalnoise commented 8 months ago

can you provide your computing-provider version?

ThomasBlock commented 8 months ago

can you provide your computing-provider version?

i compiled it

VERSION: 0.4.1+git.0067c20

edit: git checkout fea-ubi-task

0.4.1+git.428777c

sonic-chain commented 8 months ago

You can test whether the ubi-task environment is installed correctly according to the following document: https://docs.swanchain.io/orchestrator/as-a-computing-provider/computing-provider-setup/faq#q-how-can-i-verify-if-my-computing-provider-is-set-up-to-receive-ubi-tasks. While the container is running, you can view the pod logs to troubleshoot errors.

ThomasBlock commented 8 months ago

You can test whether the ubi-task environment is installed correctly according to the following document: https://docs.swanchain.io/orchestrator/as-a-computing-provider/computing-provider-setup/faq#q-how-can-i-verify-if-my-computing-provider-is-set-up-to-receive-ubi-tasks. While the container is running, you can view the pod logs to troubleshoot errors.

Yes thank you for the feedback. This somehow works.. The pod is created and finshes.. it was quite fastly deleted, so i could no longer read logs.. but in the task list, it still seems unfinished ( and is still labeled as a "CPU" task" )

curl -k --location --request POST 'https://***/api/v1/computing/cp/ubi' ...
{"status":"success","code":200,"data":"success"}
time="2024-01-30 12:42:05.157" level=info msg="receive ubi task received: {ID:1 Name:test-ubi Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5 Signature:0x13cb4547123ddc947aaebf9e4b2026fe1115390bbaa32f3579fe966fc1cc1cf05bc3e2d2516f86e65c370d879ad052805a6ea343fe7fed35d981c49870b12d3e01 Resource:0xc0007d2040}" func=DoUbiTask file="cp_service.go:547"
time="2024-01-30 12:42:05.158" level=info msg="ubi task sign verifing, task_id: 1,  type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:586"
time="2024-01-30 12:42:05.812" level=error msg="nodeName: %s, error: %+vswan1invalid character '.' looking for beginning of value" func=StatisticalSources file="k8s_service.go:347"
time="2024-01-30 12:42:05.830" level=error msg="nodeName: %s, error: %+vswan1invalid character '.' looking for beginning of value" func=GetNodeGpuSummary file="k8s_service.go:527"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: needCpu: 2, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1269"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: remainderCpu: -4, remainderMemory: 12.00, remainderStorage: 293.00" func=checkResourceAvailableForUbi file="cp_service.go:1270"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: needCpu: 2, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1269"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: remainderCpu: 19, remainderMemory: 61.00, remainderStorage: 1588.00" func=checkResourceAvailableForUbi file="cp_service.go:1270"
time="2024-01-30 12:42:05.831" level=info msg="gpuName: NVIDIA-A4000, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan2:map[NVIDIA-A4000:1] swan3:map[NVIDIA-4060-Ti:1]]" func=checkResourceAvailableForUbi file="cp_service.go:1281"
[GIN] 2024/01/30 - 12:42:05 | 200 |  673.749131ms | 212.102.118.102 | POST     "/api/v1/computing/cp/ubi"
kubectl get po -A
NAMESPACE                                       NAME                                                           READY   STATUS             RESTARTS          AGE
...
ubi-task-1                                      fil-c2-512m-1-8cbzj                                            0/1     Completed          0                 76s
computing-provider ubi-task list 
TASK ID TASK TYPE   ZK TYPE     TRANSACTION HASH    STATUS  REWARD  CREATE TIME         
... 
84      CPU         fil-c2-512M                     running 0.0     2024-01-30 10:04:07 
1       CPU         fil-c2-512M                     running 0.0     2024-01-30 12:42:05
ThomasBlock commented 8 months ago

here the log

kubectl logs -f -n ubi-task-2 fil-c2-512m-2-vxz9j
2024-01-30T11:51:28.427Z    INFO    ubi-bench   ubi-bench/main.go:96    Starting ubi-bench
2024-01-30T11:51:28.427Z    INFO    ubi-bench   ubi-bench/main.go:565   json param file of c1: /var/tmp/fil-c2-param/test-ubi.json
2024-01-30T11:51:28.427Z    WARN    ubi-bench   ubi-bench/main.go:113   reading input file:
    main.glob..func4
        /opt/ubi-benchmark/cmd/ubi-bench/main.go:568
  - open /var/tmp/fil-c2-param/test-ubi.json: no such file or directory

to this request

--data-raw '{
    "id": 2,
    "name": "test-ubi",
    "type": 1,
    "zk_type": "fil-c2-512M",
    "input_param": "https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5",
    "resource": {"cpu": "2", "gpu": "1", "memory": "5.00 GiB", "storage": "1.00 GiB"},
    "signature": "0x4d8d7efb7e77c8c0c7f8a92ee9f9bfc9eb5a0bec9a00544312d6b4d680914cf53088de6d3747e361629c6c80b431596e294720a661a1fd9214b5e1d109c1a3e100"
}'
ThomasBlock commented 8 months ago

Same for official ubi task

computing-provider ubi-task list 
TASK ID TASK TYPE   ZK TYPE     TRANSACTION HASH    STATUS  REWARD  CREATE TIME         
42      CPU         fil-c2-512M                     running 0.0     2024-01-29 07:23:26 
61      CPU         fil-c2-512M                     running 0.0     2024-01-29 20:04:07 
66      CPU         fil-c2-512M                     running 0.0     2024-01-29 22:04:08 
69      CPU         fil-c2-512M                     running 0.0     2024-01-30 00:04:07 
72      CPU         fil-c2-512M                     running 0.0     2024-01-30 02:04:07 
75      CPU         fil-c2-512M                     running 0.0     2024-01-30 04:04:07 
78      CPU         fil-c2-512M                     running 0.0     2024-01-30 06:04:07 
81      CPU         fil-c2-512M                     running 0.0     2024-01-30 08:04:07 
84      CPU         fil-c2-512M                     running 0.0     2024-01-30 10:04:07 
1       CPU         fil-c2-512M                     running 0.0     2024-01-30 12:42:05 
2       CPU         fil-c2-512M                     running 0.0     2024-01-30 12:51:27 
96      CPU         fil-c2-512M                     running 10.00   2024-01-30 14:57:18 
103     CPU         fil-c2-512M                     running 0.0     2024-01-30 16:57:18 
107     CPU         fil-c2-512M                     running 0.0     2024-01-30 18:57:18 
112     CPU         fil-c2-512M                     running 0.0     2024-01-30 20:57:18
[GIN] 2024/01/30 - 20:57:18 | 200 |  221.216482ms |   38.104.153.43 | GET      "/api/v1/computing/cp"
time="2024-01-30 20:57:18.463" level=info msg="receive ubi task received: {ID:112 Name:1000-0-7-196 Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/Qme28CvgAXj244mZwCt17xCdXCn19U7S18R5ribFzxfnp6 Signature:0x8e9e723061609a62462be4f2fff185ab6960730bf76ad62d0eeb4028ebedfb2b0f106ca94e44c17eeb3a9f31039b467e858056db39453db71166eb1b8fc5b14000 Resource:0xc000562240}" func=DoUbiTask file="cp_service.go:547"
time="2024-01-30 20:57:18.464" level=info msg="ubi task sign verifing, task_id: 112,  type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:586"
kubectl logs -f -n ubi-task-112 fil-c2-512m-112-lxz6j
2024-01-30T19:57:19.746Z    INFO    ubi-bench   ubi-bench/main.go:96    Starting ubi-bench
2024-01-30T19:57:19.746Z    INFO    ubi-bench   ubi-bench/main.go:565   json param file of c1: /var/tmp/fil-c2-param/1000-0-7-196.json
2024-01-30T19:57:19.746Z    WARN    ubi-bench   ubi-bench/main.go:113   reading input file:
    main.glob..func4
        /opt/ubi-benchmark/cmd/ubi-bench/main.go:568
  - open /var/tmp/fil-c2-param/1000-0-7-196.json: no such file or directory
sonic-chain commented 8 months ago
ThomasBlock commented 8 months ago
  • delete ubi-worker images:
 docker rmi -f filswan/ubi-worker:v1.0
  • Restart the service using the computing-provider version v0.4.2

When containerd is involved, we need these commands:

ctr -n k8s.io images list | grep ubi
ctr -n k8s.io images remove docker.io/filswan/ubi-worker:v1.0

but still no luck for me. here is a new error: @Normalnoise

kubectl describe po -n ubi-task-8
Name:             fil-c2-512m-8-k6mj7
Namespace:        ubi-task-8
Priority:         0
Service Account:  default
Node:             swan2/192.168.128.72
Start Time:       Thu, 01 Feb 2024 19:51:17 +0100
Labels:           batch.kubernetes.io/controller-uid=7377996f-6097-4c2e-bb50-deb357498e15
                  batch.kubernetes.io/job-name=fil-c2-512m-8
                  controller-uid=7377996f-6097-4c2e-bb50-deb357498e15
                  job-name=fil-c2-512m-8
Annotations:      cni.projectcalico.org/containerID: 9cfc930be8766f62c53d81507264b5ecea254ed5083ba5dbd072b1ba2944f46b
                  cni.projectcalico.org/podIP: 
                  cni.projectcalico.org/podIPs: 
Status:           Succeeded
IP:               172.16.177.91
IPs:
  IP:           172.16.177.91
Controlled By:  Job/fil-c2-512m-8
Containers:
  fil-c2-512m-8keoxr:
    Container ID:  containerd://fd24822abb6def8cf12037b34910b8d3b1ea4db583f849fe5b1ee2e2f6674db0
    Image:         filswan/ubi-worker:v1.0
    Image ID:      docker.io/filswan/ubi-worker@sha256:e1c9498b3911e7a028dbe0b908754c367c789bf8c0e2b9bd793895993ae96c84
    Port:          <none>
    Host Port:     <none>
    Command:
      ubi-bench
      c2
      /var/tmp/fil-c2-param/test-ubi.json
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 01 Feb 2024 19:51:17 +0100
      Finished:     Thu, 01 Feb 2024 19:51:17 +0100
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:                4
      ephemeral-storage:  2Gi
      memory:             10Gi
      nvidia.com/gpu:     1
    Requests:
      cpu:                2
      ephemeral-storage:  1Gi
      memory:             5Gi
      nvidia.com/gpu:     1
    Environment:
      RUST_GPU_TOOLS_CUSTOM_GPU:  NVIDIA RTX A4000:6144
      RECEIVE_PROOF_URL:          https://swan1:8085/api/v1/computing/cp/receive/ubi
      TASKID:                     8
      TASK_TYPE:                  1
      ZK_TYPE:                    fil-c2-512M
      NAME_SPACE:                 ubi-task-8
      PARAM_PATH:                 /share/cp/zk-pool/fil-c2-512M/test-ubi
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sx9sk (ro)
      /var/tmp/fil-c2-param from fil-c2-input-volume (rw)
      /var/tmp/filecoin-proof-parameters from proof-params (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  proof-params:
    Type:          HostPath (bare host directory volume)
    Path:          /var/tmp/filecoin-proof-parameters
    HostPathType:  
  fil-c2-input-volume:
    Type:          HostPath (bare host directory volume)
    Path:          /share/cp/zk-pool/fil-c2-512M/test-ubi
    HostPathType:  
  kube-api-access-sx9sk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age   From     Message
  ----    ------   ----  ----     -------
  Normal  Pulled   38s   kubelet  Container image "filswan/ubi-worker:v1.0" already present on machine
  Normal  Created  38s   kubelet  Created container fil-c2-512m-8keoxr
  Normal  Started  38s   kubelet  Started container fil-c2-512m-8keoxr
ls /share/cp/zk-pool/fil-c2-512M/test-ubi
test-ubi.json
ls /var/tmp/filecoin-proof-parameters
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk
...
 kubectl logs -f -n ubi-task-8                                      fil-c2-512m-8-k6mj7
2024-02-01T18:51:17.799Z    INFO    ubi-bench   ubi-bench/main.go:96    Starting ubi-bench
2024-02-01T18:51:17.799Z    INFO    ubi-bench   ubi-bench/main.go:556   get param from mcs url: 
2024-02-01T18:51:17.799Z    WARN    ubi-bench   ubi-bench/main.go:113   error making request to mcs url: Get "": unsupported protocol scheme ""
sonic-chain commented 8 months ago

You need to pull the code to compile, and then restart the cp service:

git clone https://github.com/swanchain/go-computing-provider.git
cd go-computing-provider && git checkout v0.4.2
make && make install
ThomasBlock commented 8 months ago

ah okay. so you update the code without further increasing version number, i see. now we are one step further, and have a new problem.

computing-provider -v
computing-provider version 0.4.2+git.24931a7
kubectl logs -f -n ubi-task-10                                     fil-c2-512m-10-c5bml
2024-02-02T11:19:32.347Z    INFO    ubi-bench   ubi-bench/main.go:96    Starting ubi-bench
2024-02-02T11:19:32.347Z    INFO    ubi-bench   ubi-bench/main.go:556   get param from mcs url: https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-8-2-102e1444a7e9a97ebf1e3d6855dcc77e66c011ea66f936d9b2c508f87f2f83a7.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0170db1f394b35d995252228ee359194b13199d259380541dc529fb0099096b0.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-032d3138d22506ec0082ed72b2dcba18df18477904e35bafee82b3793b06832f.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-ecd683648512ab1765faa2a5f14bab48f676e633467f0aa8aad4b55dcb0652bb.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-2-sha256_hasher-96f1b4a04c5c51e4759bbf224bbc2ef5a42c7100f16ec0637123f16a845ddfb2.vk is ok
2024-02-02T11:19:32.895Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk is ok
2024-02-02T11:19:32.895Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-5294475db5237a2e83c3e52fd6c2b03859a1831d45ed08c4f35dbf9a803165a9.vk is ok
2024-02-02T11:19:32.894Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-92180959e1918d26350b8e6cfe217bbdd0a2d8de51ebec269078b364b715ad63.vk is ok
2024-02-02T11:19:32.895Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk is ok
2024-02-02T11:19:32.895Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.vk is ok
2024-02-02T11:19:32.895Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.vk is ok
2024-02-02T11:19:32.895Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-b62098629d07946e9028127e70295ed996fe3ed25b0f9f88eb610a0ab4385a3c.vk is ok
2024-02-02T11:19:32.895Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-fb9e095bebdd77511c0269b967b4d87ba8b8a525edaa0e165de23ba454510194.vk is ok
2024-02-02T11:19:32.895Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-50c7368dea9593ed0989e70974d28024efa9d156d585b7eea1be22b2e753f331.vk is ok
2024-02-02T11:19:32.897Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.vk is ok
2024-02-02T11:19:32.897Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-2627e4006b67f99cef990c0a47d5426cb7ab0a0ad58fc1061547bf2d28b09def.vk is ok
2024-02-02T11:19:33.175Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:209  Parameter file /var/tmp/filecoin-proof-parameters/v28-fil-inner-product-v1.srs is ok
2024-02-02T11:19:33.175Z    INFO    paramfetch  go-paramfetch@v0.0.4/paramfetch.go:233  parameter and key-fetching complete
2024-02-02T11:19:33.176 INFO filecoin_proofs::api::seal > seal_commit_phase2:start: SectorId(0)
2024-02-02T11:19:33.176 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]
2024-02-02T11:19:33.176 INFO filecoin_proofs::caches > no params in memory cache for STACKED[536870912]
2024-02-02T11:19:33.176 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 16777216; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 2, max_count: 2 }, tree: merkletree-poseidon_hasher-8-0-0 }
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params" exist
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > checking cache_path: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params" for parameters
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > Verify production parameters is false
2024-02-02T11:19:33.252 INFO storage_proofs_core::parameter_cache > read parameters from cache "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params" 
2024-02-02T11:19:33.253 INFO bellperson::groth16::prover::native > Bellperson 0.26.0 is being used!
2024-02-02T11:19:35.352 INFO bellperson::groth16::prover::native > synthesis time: 2.099376862s
2024-02-02T11:19:35.352 INFO bellperson::groth16::prover::native > starting proof timer
2024-02-02T11:19:35.513 INFO bellperson::gpu::locks > GPU is available for FFT!
2024-02-02T11:19:35.513 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:35.579 INFO ec_gpu_gen::fft > FFT: 1 working device(s) selected. 
2024-02-02T11:19:35.579 INFO ec_gpu_gen::fft > FFT: Device 0: NVIDIA RTX A4000
2024-02-02T11:19:35.579 INFO bellperson::gpu::locks > GPU FFT kernel instantiated!
2024-02-02T11:19:37.174 INFO bellperson::gpu::locks > GPU is available for Multiexp!
2024-02-02T11:19:37.174 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:37.174 INFO bellperson::gpu::multiexp > Multiexp: CPU utilization: 0.
2024-02-02T11:19:37.175 INFO ec_gpu_gen::multiexp > Multiexp: 1 working device(s) selected.
2024-02-02T11:19:37.175 INFO ec_gpu_gen::multiexp > Multiexp: Device 0: NVIDIA RTX A4000 (Chunk-size: 91400704)
2024-02-02T11:19:37.175 INFO bellperson::gpu::locks > GPU Multiexp kernel instantiated!
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > GPU is available for Multiexp!
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:40.300 INFO bellperson::gpu::multiexp > Multiexp: CPU utilization: 0.
2024-02-02T11:19:40.300 INFO ec_gpu_gen::multiexp > Multiexp: 1 working device(s) selected.
2024-02-02T11:19:40.300 INFO ec_gpu_gen::multiexp > Multiexp: Device 0: NVIDIA RTX A4000 (Chunk-size: 44132059)
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > GPU Multiexp kernel instantiated!
2024-02-02T11:19:40.682 INFO bellperson::groth16::prover::native > prover time: 5.329874841s
2024-02-02T11:19:40.711 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key
2024-02-02T11:19:40.711 INFO filecoin_proofs::caches > no params in memory cache for STACKED[536870912]-verifying-key
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 16777216; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 2, max_count: 2 }, tree: merkletree-poseidon_hasher-8-0-0 }
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" exist
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > checking cache_path: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" for verifying key
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > Verify production parameters is false
2024-02-02T11:19:40.713 INFO storage_proofs_core::parameter_cache > read verifying key from cache "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" 
2024-02-02T11:19:40.721 INFO filecoin_proofs::api::seal > verify_seal:start: SectorId(0)
2024-02-02T11:19:40.721 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key
2024-02-02T11:19:40.721 INFO filecoin_proofs::caches > found params in memory cache for STACKED[536870912]-verifying-key
2024-02-02T11:19:40.722 INFO filecoin_proofs::api::seal > verify_seal:finish: SectorId(0)
2024-02-02T11:19:40.722 INFO filecoin_proofs::api::seal > seal_commit_phase2:finish: SectorId(0)
time="2024-02-02 11:19:40.757" level=error msg="Failed send a request, error: Post \"https://swan1:8085/api/v1/computing/cp/receive/ubi\": dial tcp: lookup swan1 on 10.96.0.10:53: no such host" func=func4 file="main.go:644"
2024-02-02T11:19:40.757Z    WARN    ubi-bench   ubi-bench/main.go:113   Post "https://swan1:8085/api/v1/computing/cp/receive/ubi": dial tcp: lookup swan1 on 10.96.0.10:53: no such host

the mentioned Ip address is indeed wrong and can nowhere be found:

kubectl get po -A -o wide
NAMESPACE                                       NAME                                                           READY   STATUS      RESTARTS        AGE     IP               NODE    NOMINATED NODE   READINESS GATES
ingress-nginx                                   ingress-nginx-admission-create-mh72w                           0/1     Completed   0               9d      <none>           swan1   <none>           <none>
ingress-nginx                                   ingress-nginx-admission-patch-2d9rc                            0/1     Completed   0               9d      <none>           swan1   <none>           <none>
ingress-nginx                                   ingress-nginx-controller-7fcc98f6bc-5phzb                      1/1     Running     1 (6d2h ago)    9d      172.16.100.76    swan1   <none>           <none>
kube-system                                     calico-kube-controllers-74d5f9d7bb-p2t7w                       1/1     Running     4 (12h ago)     9d      172.16.100.74    swan1   <none>           <none>
kube-system                                     calico-node-5cns9                                              1/1     Running     4 (3d17h ago)   4d      192.168.128.73   swan3   <none>           <none>
kube-system                                     calico-node-99rh4                                              1/1     Running     6 (3d ago)      9d      192.168.128.72   swan2   <none>           <none>
kube-system                                     calico-node-fbxhw                                              1/1     Running     1 (6d2h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     calico-node-ss6qg                                              1/1     Running     1 (2d15h ago)   2d16h   192.168.128.75   swan5   <none>           <none>
kube-system                                     coredns-5dd5756b68-bvj46                                       1/1     Running     1 (6d2h ago)    9d      172.16.100.73    swan1   <none>           <none>
kube-system                                     coredns-5dd5756b68-sss5q                                       1/1     Running     1 (6d2h ago)    9d      172.16.100.75    swan1   <none>           <none>
kube-system                                     etcd-swan1                                                     1/1     Running     3 (6d2h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     kube-apiserver-swan1                                           1/1     Running     5 (12h ago)     9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     kube-controller-manager-swan1                                  1/1     Running     29 (11h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     kube-proxy-lq975                                               1/1     Running     3 (6d2h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     kube-proxy-nttzs                                               1/1     Running     7 (3d ago)      9d      192.168.128.72   swan2   <none>           <none>
kube-system                                     kube-proxy-pz5g8                                               1/1     Running     4 (3d17h ago)   4d      192.168.128.73   swan3   <none>           <none>
kube-system                                     kube-proxy-v546g                                               1/1     Running     1 (2d15h ago)   2d16h   192.168.128.75   swan5   <none>           <none>
kube-system                                     kube-scheduler-swan1                                           1/1     Running     31 (11h ago)    9d      192.168.128.71   swan1   <none>           <none>
kube-system                                     nvidia-device-plugin-daemonset-2wgbt                           1/1     Running     1 (6d2h ago)    9d      172.16.100.78    swan1   <none>           <none>
kube-system                                     nvidia-device-plugin-daemonset-98njt                           1/1     Running     1 (2d15h ago)   2d16h   172.16.41.146    swan5   <none>           <none>
kube-system                                     nvidia-device-plugin-daemonset-df6lz                           1/1     Running     7 (3d17h ago)   4d      172.16.59.75     swan3   <none>           <none>
kube-system                                     nvidia-device-plugin-daemonset-shzdv                           1/1     Running     5 (3d ago)      9d      172.16.177.102   swan2   <none>           <none>
kube-system                                     resource-exporter-ds-4z55l                                     1/1     Running     0               16h     172.16.100.94    swan1   <none>           <none>
kube-system                                     resource-exporter-ds-6gwt2                                     1/1     Running     0               16h     172.16.177.99    swan2   <none>           <none>
kube-system                                     resource-exporter-ds-7fxgn                                     1/1     Running     0               16h     172.16.59.77     swan3   <none>           <none>
kube-system                                     resource-exporter-ds-tt7hk                                     1/1     Running     0               16h     172.16.41.180    swan5   <none>           <none>
ns-0x066af13bc249371c72939e793157ae05cbbcc981   deploy-a74cb467-da69-45f0-a559-c6f71e41cdd9-6c9bb7bb85-44xwz   1/1     Running     0               9m33s   172.16.59.80     swan3   <none>           <none>
ns-0x20445a11e5c6309579387e47564e29a174c02eb7   deploy-9a5980f2-46c2-4e5f-b660-32287f69434c-5f9c96c68c-z7szn   1/1     Running     0               2d15h   172.16.41.149    swan5   <none>           <none>
ns-0x2733c8521c1b80939415bf521775769cdabe40f3   deploy-70f77042-33e6-4e55-a3d6-ca7e1249bd88-69f5c56558-9n6ks   1/1     Running     0               3d11h   172.16.100.87    swan1   <none>           <none>
ns-0x45bcb503b0b85eb6ee6a1490aa64065597897502   deploy-2ce4655b-d152-4086-855e-4d3e9a141683-68fcc49f89-lvjc8   1/1     Running     0               2d3h    172.16.41.157    swan5   <none>           <none>
ns-0x45bcb503b0b85eb6ee6a1490aa64065597897502   deploy-705a9a33-ab71-448e-8878-647fdf49ddd0-87b9fb9f5-9jhxl    1/1     Running     0               2d3h    172.16.41.159    swan5   <none>           <none>
ns-0x5a37e272299581edb615c1483fae4af7801b91b9   deploy-6cc79c0e-d7a0-4b90-8a5b-745924d7592c-5ffc69ffc7-h5ljr   1/1     Running     0               17h     172.16.177.95    swan2   <none>           <none>
ns-0x66e91a773df9d1966ca7615179d86d8b0740cfe2   deploy-689c6d8a-9e8d-4761-b96f-099a7567bebb-bf65cb6d7-5xjst    1/1     Running     0               23h     172.16.41.164    swan5   <none>           <none>
ns-0x80a6c6848dff59dc333b2cb791b7856d303c0433   deploy-61dc252f-1b8a-4f9e-876b-482c352e7c20-7b7f44d66f-w67db   1/1     Running     0               19h     172.16.177.86    swan2   <none>           <none>
ns-0x82d9125d91b90a94b251a1ec9dd5af43a9bb6e4a   deploy-0fd8a4ac-3c03-451e-91d6-e546f7c45f9a-84f9457bcf-6xhf5   1/1     Running     0               23h     172.16.41.166    swan5   <none>           <none>
ns-0x82d9125d91b90a94b251a1ec9dd5af43a9bb6e4a   deploy-cdc2cf82-0bdc-4032-b20b-58ef3ba726f7-7968864b66-x4rt9   1/1     Running     0               23h     172.16.41.165    swan5   <none>           <none>
ns-0xf7cbba96282d30b01d4a9de0701bd2dadf74a8ff   deploy-c0485940-96ff-4c4e-96ae-0ffc0012d02a-754bb9bc58-7brks   1/1     Running     0               20h     172.16.177.79    swan2   <none>           <none>
tigera-operator                                 tigera-operator-94d7f7696-kgx6q                                1/1     Running     50 (11h ago)    9d      192.168.128.72   swan2   <none>           <none>
ThomasBlock commented 8 months ago

same for official ubi task by the way

 kubectl describe po -n ubi-task-574
Name:             fil-c2-512m-574-dsbzd
Namespace:        ubi-task-574
Priority:         0
Service Account:  default
Node:             swan2/192.168.128.72
Start Time:       Fri, 02 Feb 2024 13:30:19 +0100
Labels:           batch.kubernetes.io/controller-uid=d5703705-b1bc-4826-b741-52ed5c3b0a46
                  batch.kubernetes.io/job-name=fil-c2-512m-574
                  controller-uid=d5703705-b1bc-4826-b741-52ed5c3b0a46
                  job-name=fil-c2-512m-574
Annotations:      cni.projectcalico.org/containerID: 4b1fdae9dd48562356512ee155d351958d85af2a76e59a139c229b099ac54e64
                  cni.projectcalico.org/podIP: 
                  cni.projectcalico.org/podIPs: 
Status:           Succeeded
IP:               172.16.177.116
IPs:
  IP:           172.16.177.116
Controlled By:  Job/fil-c2-512m-574
Containers:
  fil-c2-512m-574fcugq:
    Container ID:  containerd://053ad3e8c4abd913dab965518f003f5b75b377abced535af60adaa1cfa2f7fac
    Image:         filswan/ubi-worker:v1.0
    Image ID:      docker.io/filswan/ubi-worker@sha256:e1c9498b3911e7a028dbe0b908754c367c789bf8c0e2b9bd793895993ae96c84
    Port:          <none>
    Host Port:     <none>
    Command:
      ubi-bench
      c2
      /var/tmp/fil-c2-param/1000-0-7-612.json
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 02 Feb 2024 13:30:19 +0100
      Finished:     Fri, 02 Feb 2024 13:30:29 +0100
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:                2
      ephemeral-storage:  2Gi
      memory:             10Gi
      nvidia.com/gpu:     1
    Requests:
      cpu:                1
      ephemeral-storage:  1Gi
      memory:             5Gi
      nvidia.com/gpu:     1
    Environment:
      RUST_GPU_TOOLS_CUSTOM_GPU:  NVIDIA RTX A4000:6144,NVIDIA GeForce RTX 4060 Ti:4352
      RECEIVE_PROOF_URL:          https://swan1:8085/api/v1/computing/cp/receive/ubi
      TASKID:                     574
      TASK_TYPE:                  1
      ZK_TYPE:                    fil-c2-512M
      NAME_SPACE:                 ubi-task-574
      PARAM_URL:                  https://286cb2c989.acl.multichain.storage/ipfs/QmcVwLYXHCar7Hg2wBiYwoY3jtxz7SF3hkYu6SmA7DRco5
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8scgp (ro)
      /var/tmp/filecoin-proof-parameters from proof-params (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  proof-params:
    Type:          HostPath (bare host directory volume)
    Path:          /var/tmp/filecoin-proof-parameters
    HostPathType:  
  kube-api-access-8scgp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age   From     Message
  ----    ------   ----  ----     -------
  Normal  Pulled   72s   kubelet  Container image "filswan/ubi-worker:v1.0" already present on machine
  Normal  Created  72s   kubelet  Created container fil-c2-512m-574fcugq
  Normal  Started  72s   kubelet  Started container fil-c2-512m-574fcugq

swan1 is the correct hostname.. so dns does not work.. maybe it should be changed to the external domain hostname.. or just the IP adress.. how can achieve that RECEIVE_PROOF_URLis changed?

192.168.128.71 swan1 is in /etc/hosts on each host

i tried

nano cp/fil-c2.env
RECEIVE_PROOF_URL="http://192.168.128.71:8085/api/v1/computing/cp/receive/ubi"

but then compute-priver says W0202 13:40:17.230078 1853483 warnings.go:70] spec.template.spec.containers[0].env[2]: hides previous definition of "RECEIVE_PROOF_URL"

    Environment:
      RUST_GPU_TOOLS_CUSTOM_GPU:  NVIDIA RTX A4000:6144,NVIDIA GeForce RTX 4060 Ti:4352
      RECEIVE_PROOF_URL:          http://192.168.128.71:8085/api/v1/computing/cp/receive/ubi
      RECEIVE_PROOF_URL:          https://swan1:8085/api/v1/computing/cp/receive/ubi
      TASKID:                     12
      TASK_TYPE:                  1
      ZK_TYPE:                    fil-c2-512M
      NAME_SPACE:                 ubi-task-12
      PARAM_URL:                  https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5
ThomasBlock commented 8 months ago

here is a way i solved the dns problem:

kubectl edit cm coredns -n kube-system
swan1. {
        hosts {
            192.168.128.71 swan1
        }
    }

kubectl rollout restart deployment coredns -n kube-system

ubi is now starting. but i am not connected to the hub ( https://github.com/swanchain/go-computing-provider/issues/12#issuecomment-1925214934 )