Closed ThomasBlock closed 7 months ago
can you provide file list in /var/tmp/filecoin-proof-parameters
?
can you provide file list in
/var/tmp/filecoin-proof-parameters
?
yes - i downloaded it with filecoin is that okay? ( see https://github.com/swanchain/ubi-benchmark/issues/1 )
ls /var/tmp/filecoin-proof-parameters
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-92180959e1918d26350b8e6cfe217bbdd0a2d8de51ebec269078b364b715ad63.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-fb9e095bebdd77511c0269b967b4d87ba8b8a525edaa0e165de23ba454510194.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.vk
v28-empty-sector-update-merkletree-poseidon_hasher-8-8-2-102e1444a7e9a97ebf1e3d6855dcc77e66c011ea66f936d9b2c508f87f2f83a7.vk
v28-fil-inner-product-v1.srs
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0170db1f394b35d995252228ee359194b13199d259380541dc529fb0099096b0.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-50c7368dea9593ed0989e70974d28024efa9d156d585b7eea1be22b2e753f331.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-5294475db5237a2e83c3e52fd6c2b03859a1831d45ed08c4f35dbf9a803165a9.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.params
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-2627e4006b67f99cef990c0a47d5426cb7ab0a0ad58fc1061547bf2d28b09def.vk
v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-b62098629d07946e9028127e70295ed996fe3ed25b0f9f88eb610a0ab4385a3c.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-032d3138d22506ec0082ed72b2dcba18df18477904e35bafee82b3793b06832f.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-ecd683648512ab1765faa2a5f14bab48f676e633467f0aa8aad4b55dcb0652bb.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.params
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk
v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-2-sha256_hasher-96f1b4a04c5c51e4759bbf224bbc2ef5a42c7100f16ec0637123f16a845ddfb2.vk
can you provide your computing-provider version?
can you provide your computing-provider version?
i compiled it
VERSION: 0.4.1+git.0067c20
edit:
git checkout fea-ubi-task
0.4.1+git.428777c
You can test whether the ubi-task environment is installed correctly according to the following document: https://docs.swanchain.io/orchestrator/as-a-computing-provider/computing-provider-setup/faq#q-how-can-i-verify-if-my-computing-provider-is-set-up-to-receive-ubi-tasks. While the container is running, you can view the pod logs to troubleshoot errors.
You can test whether the ubi-task environment is installed correctly according to the following document: https://docs.swanchain.io/orchestrator/as-a-computing-provider/computing-provider-setup/faq#q-how-can-i-verify-if-my-computing-provider-is-set-up-to-receive-ubi-tasks. While the container is running, you can view the pod logs to troubleshoot errors.
Yes thank you for the feedback. This somehow works.. The pod is created and finshes.. it was quite fastly deleted, so i could no longer read logs.. but in the task list, it still seems unfinished ( and is still labeled as a "CPU" task" )
curl -k --location --request POST 'https://***/api/v1/computing/cp/ubi' ...
{"status":"success","code":200,"data":"success"}
time="2024-01-30 12:42:05.157" level=info msg="receive ubi task received: {ID:1 Name:test-ubi Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5 Signature:0x13cb4547123ddc947aaebf9e4b2026fe1115390bbaa32f3579fe966fc1cc1cf05bc3e2d2516f86e65c370d879ad052805a6ea343fe7fed35d981c49870b12d3e01 Resource:0xc0007d2040}" func=DoUbiTask file="cp_service.go:547"
time="2024-01-30 12:42:05.158" level=info msg="ubi task sign verifing, task_id: 1, type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:586"
time="2024-01-30 12:42:05.812" level=error msg="nodeName: %s, error: %+vswan1invalid character '.' looking for beginning of value" func=StatisticalSources file="k8s_service.go:347"
time="2024-01-30 12:42:05.830" level=error msg="nodeName: %s, error: %+vswan1invalid character '.' looking for beginning of value" func=GetNodeGpuSummary file="k8s_service.go:527"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: needCpu: 2, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1269"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: remainderCpu: -4, remainderMemory: 12.00, remainderStorage: 293.00" func=checkResourceAvailableForUbi file="cp_service.go:1270"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: needCpu: 2, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1269"
time="2024-01-30 12:42:05.831" level=info msg="checkResourceAvailableForUbi: remainderCpu: 19, remainderMemory: 61.00, remainderStorage: 1588.00" func=checkResourceAvailableForUbi file="cp_service.go:1270"
time="2024-01-30 12:42:05.831" level=info msg="gpuName: NVIDIA-A4000, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan2:map[NVIDIA-A4000:1] swan3:map[NVIDIA-4060-Ti:1]]" func=checkResourceAvailableForUbi file="cp_service.go:1281"
[GIN] 2024/01/30 - 12:42:05 | 200 | 673.749131ms | 212.102.118.102 | POST "/api/v1/computing/cp/ubi"
kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
...
ubi-task-1 fil-c2-512m-1-8cbzj 0/1 Completed 0 76s
computing-provider ubi-task list
TASK ID TASK TYPE ZK TYPE TRANSACTION HASH STATUS REWARD CREATE TIME
...
84 CPU fil-c2-512M running 0.0 2024-01-30 10:04:07
1 CPU fil-c2-512M running 0.0 2024-01-30 12:42:05
here the log
kubectl logs -f -n ubi-task-2 fil-c2-512m-2-vxz9j
2024-01-30T11:51:28.427Z INFO ubi-bench ubi-bench/main.go:96 Starting ubi-bench
2024-01-30T11:51:28.427Z INFO ubi-bench ubi-bench/main.go:565 json param file of c1: /var/tmp/fil-c2-param/test-ubi.json
2024-01-30T11:51:28.427Z WARN ubi-bench ubi-bench/main.go:113 reading input file:
main.glob..func4
/opt/ubi-benchmark/cmd/ubi-bench/main.go:568
- open /var/tmp/fil-c2-param/test-ubi.json: no such file or directory
to this request
--data-raw '{
"id": 2,
"name": "test-ubi",
"type": 1,
"zk_type": "fil-c2-512M",
"input_param": "https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5",
"resource": {"cpu": "2", "gpu": "1", "memory": "5.00 GiB", "storage": "1.00 GiB"},
"signature": "0x4d8d7efb7e77c8c0c7f8a92ee9f9bfc9eb5a0bec9a00544312d6b4d680914cf53088de6d3747e361629c6c80b431596e294720a661a1fd9214b5e1d109c1a3e100"
}'
Same for official ubi task
computing-provider ubi-task list
TASK ID TASK TYPE ZK TYPE TRANSACTION HASH STATUS REWARD CREATE TIME
42 CPU fil-c2-512M running 0.0 2024-01-29 07:23:26
61 CPU fil-c2-512M running 0.0 2024-01-29 20:04:07
66 CPU fil-c2-512M running 0.0 2024-01-29 22:04:08
69 CPU fil-c2-512M running 0.0 2024-01-30 00:04:07
72 CPU fil-c2-512M running 0.0 2024-01-30 02:04:07
75 CPU fil-c2-512M running 0.0 2024-01-30 04:04:07
78 CPU fil-c2-512M running 0.0 2024-01-30 06:04:07
81 CPU fil-c2-512M running 0.0 2024-01-30 08:04:07
84 CPU fil-c2-512M running 0.0 2024-01-30 10:04:07
1 CPU fil-c2-512M running 0.0 2024-01-30 12:42:05
2 CPU fil-c2-512M running 0.0 2024-01-30 12:51:27
96 CPU fil-c2-512M running 10.00 2024-01-30 14:57:18
103 CPU fil-c2-512M running 0.0 2024-01-30 16:57:18
107 CPU fil-c2-512M running 0.0 2024-01-30 18:57:18
112 CPU fil-c2-512M running 0.0 2024-01-30 20:57:18
[GIN] 2024/01/30 - 20:57:18 | 200 | 221.216482ms | 38.104.153.43 | GET "/api/v1/computing/cp"
time="2024-01-30 20:57:18.463" level=info msg="receive ubi task received: {ID:112 Name:1000-0-7-196 Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/Qme28CvgAXj244mZwCt17xCdXCn19U7S18R5ribFzxfnp6 Signature:0x8e9e723061609a62462be4f2fff185ab6960730bf76ad62d0eeb4028ebedfb2b0f106ca94e44c17eeb3a9f31039b467e858056db39453db71166eb1b8fc5b14000 Resource:0xc000562240}" func=DoUbiTask file="cp_service.go:547"
time="2024-01-30 20:57:18.464" level=info msg="ubi task sign verifing, task_id: 112, type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:586"
kubectl logs -f -n ubi-task-112 fil-c2-512m-112-lxz6j
2024-01-30T19:57:19.746Z INFO ubi-bench ubi-bench/main.go:96 Starting ubi-bench
2024-01-30T19:57:19.746Z INFO ubi-bench ubi-bench/main.go:565 json param file of c1: /var/tmp/fil-c2-param/1000-0-7-196.json
2024-01-30T19:57:19.746Z WARN ubi-bench ubi-bench/main.go:113 reading input file:
main.glob..func4
/opt/ubi-benchmark/cmd/ubi-bench/main.go:568
- open /var/tmp/fil-c2-param/1000-0-7-196.json: no such file or directory
ubi-worker
images:
docker rmi -f filswan/ubi-worker:v1.0
- delete
ubi-worker
images:docker rmi -f filswan/ubi-worker:v1.0
- Restart the service using the computing-provider version v0.4.2
When containerd is involved, we need these commands:
ctr -n k8s.io images list | grep ubi
ctr -n k8s.io images remove docker.io/filswan/ubi-worker:v1.0
but still no luck for me. here is a new error: @Normalnoise
kubectl describe po -n ubi-task-8
Name: fil-c2-512m-8-k6mj7
Namespace: ubi-task-8
Priority: 0
Service Account: default
Node: swan2/192.168.128.72
Start Time: Thu, 01 Feb 2024 19:51:17 +0100
Labels: batch.kubernetes.io/controller-uid=7377996f-6097-4c2e-bb50-deb357498e15
batch.kubernetes.io/job-name=fil-c2-512m-8
controller-uid=7377996f-6097-4c2e-bb50-deb357498e15
job-name=fil-c2-512m-8
Annotations: cni.projectcalico.org/containerID: 9cfc930be8766f62c53d81507264b5ecea254ed5083ba5dbd072b1ba2944f46b
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Succeeded
IP: 172.16.177.91
IPs:
IP: 172.16.177.91
Controlled By: Job/fil-c2-512m-8
Containers:
fil-c2-512m-8keoxr:
Container ID: containerd://fd24822abb6def8cf12037b34910b8d3b1ea4db583f849fe5b1ee2e2f6674db0
Image: filswan/ubi-worker:v1.0
Image ID: docker.io/filswan/ubi-worker@sha256:e1c9498b3911e7a028dbe0b908754c367c789bf8c0e2b9bd793895993ae96c84
Port: <none>
Host Port: <none>
Command:
ubi-bench
c2
/var/tmp/fil-c2-param/test-ubi.json
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 01 Feb 2024 19:51:17 +0100
Finished: Thu, 01 Feb 2024 19:51:17 +0100
Ready: False
Restart Count: 0
Limits:
cpu: 4
ephemeral-storage: 2Gi
memory: 10Gi
nvidia.com/gpu: 1
Requests:
cpu: 2
ephemeral-storage: 1Gi
memory: 5Gi
nvidia.com/gpu: 1
Environment:
RUST_GPU_TOOLS_CUSTOM_GPU: NVIDIA RTX A4000:6144
RECEIVE_PROOF_URL: https://swan1:8085/api/v1/computing/cp/receive/ubi
TASKID: 8
TASK_TYPE: 1
ZK_TYPE: fil-c2-512M
NAME_SPACE: ubi-task-8
PARAM_PATH: /share/cp/zk-pool/fil-c2-512M/test-ubi
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sx9sk (ro)
/var/tmp/fil-c2-param from fil-c2-input-volume (rw)
/var/tmp/filecoin-proof-parameters from proof-params (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
proof-params:
Type: HostPath (bare host directory volume)
Path: /var/tmp/filecoin-proof-parameters
HostPathType:
fil-c2-input-volume:
Type: HostPath (bare host directory volume)
Path: /share/cp/zk-pool/fil-c2-512M/test-ubi
HostPathType:
kube-api-access-sx9sk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 38s kubelet Container image "filswan/ubi-worker:v1.0" already present on machine
Normal Created 38s kubelet Created container fil-c2-512m-8keoxr
Normal Started 38s kubelet Started container fil-c2-512m-8keoxr
ls /share/cp/zk-pool/fil-c2-512M/test-ubi
test-ubi.json
ls /var/tmp/filecoin-proof-parameters
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.params
v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk
...
kubectl logs -f -n ubi-task-8 fil-c2-512m-8-k6mj7
2024-02-01T18:51:17.799Z INFO ubi-bench ubi-bench/main.go:96 Starting ubi-bench
2024-02-01T18:51:17.799Z INFO ubi-bench ubi-bench/main.go:556 get param from mcs url:
2024-02-01T18:51:17.799Z WARN ubi-bench ubi-bench/main.go:113 error making request to mcs url: Get "": unsupported protocol scheme ""
You need to pull the code to compile, and then restart the cp service:
git clone https://github.com/swanchain/go-computing-provider.git
cd go-computing-provider && git checkout v0.4.2
make && make install
ah okay. so you update the code without further increasing version number, i see. now we are one step further, and have a new problem.
computing-provider -v
computing-provider version 0.4.2+git.24931a7
kubectl logs -f -n ubi-task-10 fil-c2-512m-10-c5bml
2024-02-02T11:19:32.347Z INFO ubi-bench ubi-bench/main.go:96 Starting ubi-bench
2024-02-02T11:19:32.347Z INFO ubi-bench ubi-bench/main.go:556 get param from mcs url: https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-8-2-102e1444a7e9a97ebf1e3d6855dcc77e66c011ea66f936d9b2c508f87f2f83a7.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0170db1f394b35d995252228ee359194b13199d259380541dc529fb0099096b0.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-559e581f022bb4e4ec6e719e563bf0e026ad6de42e56c18714a2c692b1b88d7e.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-032d3138d22506ec0082ed72b2dcba18df18477904e35bafee82b3793b06832f.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-ecd683648512ab1765faa2a5f14bab48f676e633467f0aa8aad4b55dcb0652bb.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-7d739b8cf60f1b0709eeebee7730e297683552e4b69cab6984ec0285663c5781.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-2-sha256_hasher-96f1b4a04c5c51e4759bbf224bbc2ef5a42c7100f16ec0637123f16a845ddfb2.vk is ok
2024-02-02T11:19:32.895Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-61fa69f38b9cc771ba27b670124714b4ea77fbeae05e377fb859c4a43b73a30c.vk is ok
2024-02-02T11:19:32.895Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-5294475db5237a2e83c3e52fd6c2b03859a1831d45ed08c4f35dbf9a803165a9.vk is ok
2024-02-02T11:19:32.894Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-92180959e1918d26350b8e6cfe217bbdd0a2d8de51ebec269078b364b715ad63.vk is ok
2024-02-02T11:19:32.895Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-8-0-sha256_hasher-82a357d2f2ca81dc61bb45f4a762807aedee1b0a53fd6c4e77b46a01bfef7820.vk is ok
2024-02-02T11:19:32.895Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-0cfb4f178bbb71cf2ecfcd42accce558b27199ab4fb59cb78f2483fe21ef36d9.vk is ok
2024-02-02T11:19:32.895Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.vk is ok
2024-02-02T11:19:32.895Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-b62098629d07946e9028127e70295ed996fe3ed25b0f9f88eb610a0ab4385a3c.vk is ok
2024-02-02T11:19:32.895Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-empty-sector-update-merkletree-poseidon_hasher-8-0-0-fb9e095bebdd77511c0269b967b4d87ba8b8a525edaa0e165de23ba454510194.vk is ok
2024-02-02T11:19:32.895Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-50c7368dea9593ed0989e70974d28024efa9d156d585b7eea1be22b2e753f331.vk is ok
2024-02-02T11:19:32.897Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-0-0377ded656c6f524f1618760bffe4e0a1c51d5a70c4509eedae8a27555733edc.vk is ok
2024-02-02T11:19:32.897Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-8-2-2627e4006b67f99cef990c0a47d5426cb7ab0a0ad58fc1061547bf2d28b09def.vk is ok
2024-02-02T11:19:33.175Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:209 Parameter file /var/tmp/filecoin-proof-parameters/v28-fil-inner-product-v1.srs is ok
2024-02-02T11:19:33.175Z INFO paramfetch go-paramfetch@v0.0.4/paramfetch.go:233 parameter and key-fetching complete
2024-02-02T11:19:33.176 INFO filecoin_proofs::api::seal > seal_commit_phase2:start: SectorId(0)
2024-02-02T11:19:33.176 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]
2024-02-02T11:19:33.176 INFO filecoin_proofs::caches > no params in memory cache for STACKED[536870912]
2024-02-02T11:19:33.176 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 16777216; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 2, max_count: 2 }, tree: merkletree-poseidon_hasher-8-0-0 }
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params" exist
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > checking cache_path: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params" for parameters
2024-02-02T11:19:33.177 INFO storage_proofs_core::parameter_cache > Verify production parameters is false
2024-02-02T11:19:33.252 INFO storage_proofs_core::parameter_cache > read parameters from cache "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.params"
2024-02-02T11:19:33.253 INFO bellperson::groth16::prover::native > Bellperson 0.26.0 is being used!
2024-02-02T11:19:35.352 INFO bellperson::groth16::prover::native > synthesis time: 2.099376862s
2024-02-02T11:19:35.352 INFO bellperson::groth16::prover::native > starting proof timer
2024-02-02T11:19:35.513 INFO bellperson::gpu::locks > GPU is available for FFT!
2024-02-02T11:19:35.513 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:35.579 INFO ec_gpu_gen::fft > FFT: 1 working device(s) selected.
2024-02-02T11:19:35.579 INFO ec_gpu_gen::fft > FFT: Device 0: NVIDIA RTX A4000
2024-02-02T11:19:35.579 INFO bellperson::gpu::locks > GPU FFT kernel instantiated!
2024-02-02T11:19:37.174 INFO bellperson::gpu::locks > GPU is available for Multiexp!
2024-02-02T11:19:37.174 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:37.174 INFO bellperson::gpu::multiexp > Multiexp: CPU utilization: 0.
2024-02-02T11:19:37.175 INFO ec_gpu_gen::multiexp > Multiexp: 1 working device(s) selected.
2024-02-02T11:19:37.175 INFO ec_gpu_gen::multiexp > Multiexp: Device 0: NVIDIA RTX A4000 (Chunk-size: 91400704)
2024-02-02T11:19:37.175 INFO bellperson::gpu::locks > GPU Multiexp kernel instantiated!
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > GPU is available for Multiexp!
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > BELLPERSON_GPUS_PER_LOCK fallback to single lock mode
2024-02-02T11:19:40.300 INFO bellperson::gpu::multiexp > Multiexp: CPU utilization: 0.
2024-02-02T11:19:40.300 INFO ec_gpu_gen::multiexp > Multiexp: 1 working device(s) selected.
2024-02-02T11:19:40.300 INFO ec_gpu_gen::multiexp > Multiexp: Device 0: NVIDIA RTX A4000 (Chunk-size: 44132059)
2024-02-02T11:19:40.300 INFO bellperson::gpu::locks > GPU Multiexp kernel instantiated!
2024-02-02T11:19:40.682 INFO bellperson::groth16::prover::native > prover time: 5.329874841s
2024-02-02T11:19:40.711 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key
2024-02-02T11:19:40.711 INFO filecoin_proofs::caches > no params in memory cache for STACKED[536870912]-verifying-key
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > parameter set identifier for cache: layered_drgporep::PublicParams{ graph: stacked_graph::StackedGraph{expansion_degree: 8 base_graph: drgraph::BucketGraph{size: 16777216; degree: 6; hasher: poseidon_hasher} }, challenges: LayerChallenges { layers: 2, max_count: 2 }, tree: merkletree-poseidon_hasher-8-0-0 }
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > ensuring that all ancestor directories for: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" exist
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > checking cache_path: "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk" for verifying key
2024-02-02T11:19:40.711 INFO storage_proofs_core::parameter_cache > Verify production parameters is false
2024-02-02T11:19:40.713 INFO storage_proofs_core::parameter_cache > read verifying key from cache "/var/tmp/filecoin-proof-parameters/v28-stacked-proof-of-replication-merkletree-poseidon_hasher-8-0-0-sha256_hasher-6babf46ce344ae495d558e7770a585b2382d54f225af8ed0397b8be7c3fcd472.vk"
2024-02-02T11:19:40.721 INFO filecoin_proofs::api::seal > verify_seal:start: SectorId(0)
2024-02-02T11:19:40.721 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[536870912]-verifying-key
2024-02-02T11:19:40.721 INFO filecoin_proofs::caches > found params in memory cache for STACKED[536870912]-verifying-key
2024-02-02T11:19:40.722 INFO filecoin_proofs::api::seal > verify_seal:finish: SectorId(0)
2024-02-02T11:19:40.722 INFO filecoin_proofs::api::seal > seal_commit_phase2:finish: SectorId(0)
time="2024-02-02 11:19:40.757" level=error msg="Failed send a request, error: Post \"https://swan1:8085/api/v1/computing/cp/receive/ubi\": dial tcp: lookup swan1 on 10.96.0.10:53: no such host" func=func4 file="main.go:644"
2024-02-02T11:19:40.757Z WARN ubi-bench ubi-bench/main.go:113 Post "https://swan1:8085/api/v1/computing/cp/receive/ubi": dial tcp: lookup swan1 on 10.96.0.10:53: no such host
the mentioned Ip address is indeed wrong and can nowhere be found:
kubectl get po -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ingress-nginx ingress-nginx-admission-create-mh72w 0/1 Completed 0 9d <none> swan1 <none> <none>
ingress-nginx ingress-nginx-admission-patch-2d9rc 0/1 Completed 0 9d <none> swan1 <none> <none>
ingress-nginx ingress-nginx-controller-7fcc98f6bc-5phzb 1/1 Running 1 (6d2h ago) 9d 172.16.100.76 swan1 <none> <none>
kube-system calico-kube-controllers-74d5f9d7bb-p2t7w 1/1 Running 4 (12h ago) 9d 172.16.100.74 swan1 <none> <none>
kube-system calico-node-5cns9 1/1 Running 4 (3d17h ago) 4d 192.168.128.73 swan3 <none> <none>
kube-system calico-node-99rh4 1/1 Running 6 (3d ago) 9d 192.168.128.72 swan2 <none> <none>
kube-system calico-node-fbxhw 1/1 Running 1 (6d2h ago) 9d 192.168.128.71 swan1 <none> <none>
kube-system calico-node-ss6qg 1/1 Running 1 (2d15h ago) 2d16h 192.168.128.75 swan5 <none> <none>
kube-system coredns-5dd5756b68-bvj46 1/1 Running 1 (6d2h ago) 9d 172.16.100.73 swan1 <none> <none>
kube-system coredns-5dd5756b68-sss5q 1/1 Running 1 (6d2h ago) 9d 172.16.100.75 swan1 <none> <none>
kube-system etcd-swan1 1/1 Running 3 (6d2h ago) 9d 192.168.128.71 swan1 <none> <none>
kube-system kube-apiserver-swan1 1/1 Running 5 (12h ago) 9d 192.168.128.71 swan1 <none> <none>
kube-system kube-controller-manager-swan1 1/1 Running 29 (11h ago) 9d 192.168.128.71 swan1 <none> <none>
kube-system kube-proxy-lq975 1/1 Running 3 (6d2h ago) 9d 192.168.128.71 swan1 <none> <none>
kube-system kube-proxy-nttzs 1/1 Running 7 (3d ago) 9d 192.168.128.72 swan2 <none> <none>
kube-system kube-proxy-pz5g8 1/1 Running 4 (3d17h ago) 4d 192.168.128.73 swan3 <none> <none>
kube-system kube-proxy-v546g 1/1 Running 1 (2d15h ago) 2d16h 192.168.128.75 swan5 <none> <none>
kube-system kube-scheduler-swan1 1/1 Running 31 (11h ago) 9d 192.168.128.71 swan1 <none> <none>
kube-system nvidia-device-plugin-daemonset-2wgbt 1/1 Running 1 (6d2h ago) 9d 172.16.100.78 swan1 <none> <none>
kube-system nvidia-device-plugin-daemonset-98njt 1/1 Running 1 (2d15h ago) 2d16h 172.16.41.146 swan5 <none> <none>
kube-system nvidia-device-plugin-daemonset-df6lz 1/1 Running 7 (3d17h ago) 4d 172.16.59.75 swan3 <none> <none>
kube-system nvidia-device-plugin-daemonset-shzdv 1/1 Running 5 (3d ago) 9d 172.16.177.102 swan2 <none> <none>
kube-system resource-exporter-ds-4z55l 1/1 Running 0 16h 172.16.100.94 swan1 <none> <none>
kube-system resource-exporter-ds-6gwt2 1/1 Running 0 16h 172.16.177.99 swan2 <none> <none>
kube-system resource-exporter-ds-7fxgn 1/1 Running 0 16h 172.16.59.77 swan3 <none> <none>
kube-system resource-exporter-ds-tt7hk 1/1 Running 0 16h 172.16.41.180 swan5 <none> <none>
ns-0x066af13bc249371c72939e793157ae05cbbcc981 deploy-a74cb467-da69-45f0-a559-c6f71e41cdd9-6c9bb7bb85-44xwz 1/1 Running 0 9m33s 172.16.59.80 swan3 <none> <none>
ns-0x20445a11e5c6309579387e47564e29a174c02eb7 deploy-9a5980f2-46c2-4e5f-b660-32287f69434c-5f9c96c68c-z7szn 1/1 Running 0 2d15h 172.16.41.149 swan5 <none> <none>
ns-0x2733c8521c1b80939415bf521775769cdabe40f3 deploy-70f77042-33e6-4e55-a3d6-ca7e1249bd88-69f5c56558-9n6ks 1/1 Running 0 3d11h 172.16.100.87 swan1 <none> <none>
ns-0x45bcb503b0b85eb6ee6a1490aa64065597897502 deploy-2ce4655b-d152-4086-855e-4d3e9a141683-68fcc49f89-lvjc8 1/1 Running 0 2d3h 172.16.41.157 swan5 <none> <none>
ns-0x45bcb503b0b85eb6ee6a1490aa64065597897502 deploy-705a9a33-ab71-448e-8878-647fdf49ddd0-87b9fb9f5-9jhxl 1/1 Running 0 2d3h 172.16.41.159 swan5 <none> <none>
ns-0x5a37e272299581edb615c1483fae4af7801b91b9 deploy-6cc79c0e-d7a0-4b90-8a5b-745924d7592c-5ffc69ffc7-h5ljr 1/1 Running 0 17h 172.16.177.95 swan2 <none> <none>
ns-0x66e91a773df9d1966ca7615179d86d8b0740cfe2 deploy-689c6d8a-9e8d-4761-b96f-099a7567bebb-bf65cb6d7-5xjst 1/1 Running 0 23h 172.16.41.164 swan5 <none> <none>
ns-0x80a6c6848dff59dc333b2cb791b7856d303c0433 deploy-61dc252f-1b8a-4f9e-876b-482c352e7c20-7b7f44d66f-w67db 1/1 Running 0 19h 172.16.177.86 swan2 <none> <none>
ns-0x82d9125d91b90a94b251a1ec9dd5af43a9bb6e4a deploy-0fd8a4ac-3c03-451e-91d6-e546f7c45f9a-84f9457bcf-6xhf5 1/1 Running 0 23h 172.16.41.166 swan5 <none> <none>
ns-0x82d9125d91b90a94b251a1ec9dd5af43a9bb6e4a deploy-cdc2cf82-0bdc-4032-b20b-58ef3ba726f7-7968864b66-x4rt9 1/1 Running 0 23h 172.16.41.165 swan5 <none> <none>
ns-0xf7cbba96282d30b01d4a9de0701bd2dadf74a8ff deploy-c0485940-96ff-4c4e-96ae-0ffc0012d02a-754bb9bc58-7brks 1/1 Running 0 20h 172.16.177.79 swan2 <none> <none>
tigera-operator tigera-operator-94d7f7696-kgx6q 1/1 Running 50 (11h ago) 9d 192.168.128.72 swan2 <none> <none>
same for official ubi task by the way
kubectl describe po -n ubi-task-574
Name: fil-c2-512m-574-dsbzd
Namespace: ubi-task-574
Priority: 0
Service Account: default
Node: swan2/192.168.128.72
Start Time: Fri, 02 Feb 2024 13:30:19 +0100
Labels: batch.kubernetes.io/controller-uid=d5703705-b1bc-4826-b741-52ed5c3b0a46
batch.kubernetes.io/job-name=fil-c2-512m-574
controller-uid=d5703705-b1bc-4826-b741-52ed5c3b0a46
job-name=fil-c2-512m-574
Annotations: cni.projectcalico.org/containerID: 4b1fdae9dd48562356512ee155d351958d85af2a76e59a139c229b099ac54e64
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Succeeded
IP: 172.16.177.116
IPs:
IP: 172.16.177.116
Controlled By: Job/fil-c2-512m-574
Containers:
fil-c2-512m-574fcugq:
Container ID: containerd://053ad3e8c4abd913dab965518f003f5b75b377abced535af60adaa1cfa2f7fac
Image: filswan/ubi-worker:v1.0
Image ID: docker.io/filswan/ubi-worker@sha256:e1c9498b3911e7a028dbe0b908754c367c789bf8c0e2b9bd793895993ae96c84
Port: <none>
Host Port: <none>
Command:
ubi-bench
c2
/var/tmp/fil-c2-param/1000-0-7-612.json
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 02 Feb 2024 13:30:19 +0100
Finished: Fri, 02 Feb 2024 13:30:29 +0100
Ready: False
Restart Count: 0
Limits:
cpu: 2
ephemeral-storage: 2Gi
memory: 10Gi
nvidia.com/gpu: 1
Requests:
cpu: 1
ephemeral-storage: 1Gi
memory: 5Gi
nvidia.com/gpu: 1
Environment:
RUST_GPU_TOOLS_CUSTOM_GPU: NVIDIA RTX A4000:6144,NVIDIA GeForce RTX 4060 Ti:4352
RECEIVE_PROOF_URL: https://swan1:8085/api/v1/computing/cp/receive/ubi
TASKID: 574
TASK_TYPE: 1
ZK_TYPE: fil-c2-512M
NAME_SPACE: ubi-task-574
PARAM_URL: https://286cb2c989.acl.multichain.storage/ipfs/QmcVwLYXHCar7Hg2wBiYwoY3jtxz7SF3hkYu6SmA7DRco5
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8scgp (ro)
/var/tmp/filecoin-proof-parameters from proof-params (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
proof-params:
Type: HostPath (bare host directory volume)
Path: /var/tmp/filecoin-proof-parameters
HostPathType:
kube-api-access-8scgp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 72s kubelet Container image "filswan/ubi-worker:v1.0" already present on machine
Normal Created 72s kubelet Created container fil-c2-512m-574fcugq
Normal Started 72s kubelet Started container fil-c2-512m-574fcugq
swan1 is the correct hostname.. so dns does not work.. maybe it should be changed to the external domain hostname.. or just the IP adress.. how can achieve that RECEIVE_PROOF_URL
is changed?
192.168.128.71 swan1
is in /etc/hosts
on each host
i tried
nano cp/fil-c2.env
RECEIVE_PROOF_URL="http://192.168.128.71:8085/api/v1/computing/cp/receive/ubi"
but then compute-priver says
W0202 13:40:17.230078 1853483 warnings.go:70] spec.template.spec.containers[0].env[2]: hides previous definition of "RECEIVE_PROOF_URL"
Environment:
RUST_GPU_TOOLS_CUSTOM_GPU: NVIDIA RTX A4000:6144,NVIDIA GeForce RTX 4060 Ti:4352
RECEIVE_PROOF_URL: http://192.168.128.71:8085/api/v1/computing/cp/receive/ubi
RECEIVE_PROOF_URL: https://swan1:8085/api/v1/computing/cp/receive/ubi
TASKID: 12
TASK_TYPE: 1
ZK_TYPE: fil-c2-512M
NAME_SPACE: ubi-task-12
PARAM_URL: https://286cb2c989.acl.multichain.storage/ipfs/QmYg4CfA5E2zR4ktb5B3PafAeCWyEEXiKUVS4g2UE9occ5
here is a way i solved the dns problem:
kubectl edit cm coredns -n kube-system
swan1. {
hosts {
192.168.128.71 swan1
}
}
kubectl rollout restart deployment coredns -n kube-system
ubi is now starting. but i am not connected to the hub ( https://github.com/swanchain/go-computing-provider/issues/12#issuecomment-1925214934 )
I managed to get computing-provider running. But the ubi tasks are not starting. What might be missing?
My node setup: Swan1 = go-computing-provider, public ip address, kubernetes, no gpu Swan2 = kubernetes, gpu A4000 Swan3 = kubernetes, gpu RTX 4060TI ( not in official support list? )
one idea might be that i have too many cpu spaces - so what i dont have enough resources to do ubi task. but i checked this and also killed some tasks:
here my config
here the logs regarding ubi:
There are no pods created for ubi