swanchain / go-computing-provider

A golang implementation of computing provider
MIT License
11 stars 15 forks source link

doubi ubi-task all failed #19

Closed scvpark closed 2 weeks ago

scvpark commented 5 months ago

I have completed a lot of ubi-tasks. However, i have check failed ubi-tasks from today. All failed tasks include a DoUbiTask function.

computing-provider version computing-provider version 0.4.2+git.24931a7

computing-provider ubi-task list 3027 CPU fil-c2-512M 0xff883d04b04292bf success 10.00 2024-02-04 16:00:38 3045 CPU fil-c2-512M 0x05c00b8cfe463b12 success 10.00 2024-02-04 16:30:38 3061 CPU fil-c2-512M 0x163d0685480ab17 success 10.00 2024-02-04 17:00:43 3078 CPU fil-c2-512M 0xdb894c594515c97 success 10.00 2024-02-04 17:30:39 3094 CPU fil-c2-512M 0x05c001997dfbfea success 10.00 2024-02-04 18:00:36 3113 CPU fil-c2-512M failed 0.0 2024-02-04 18:30:41 3129 CPU fil-c2-512M failed 0.0 2024-02-04 19:00:50 3145 CPU fil-c2-512M failed 0.0 2024-02-04 19:30:42 3163 CPU fil-c2-512M failed 0.0 2024-02-04 20:00:41 3181 CPU fil-c2-512M failed 0.0 2024-02-04 20:30:51 3199 CPU fil-c2-512M failed 0.0 2024-02-04 21:00:40 3217 CPU fil-c2-512M failed 0.0 2024-02-04 21:30:44 3236 CPU fil-c2-512M failed 0.0 2024-02-04 22:00:42 3255 CPU fil-c2-512M failed 0.0 2024-02-04 22:30:44 3274 CPU fil-c2-512M failed 0.0 2024-02-04 23:00:41 3293 CPU fil-c2-512M failed 0.0 2024-02-04 23:30:49 3312 CPU fil-c2-512M failed 0.0 2024-02-05 00:00:48 3331 CPU fil-c2-512M failed 0.0 2024-02-05 00:30:42 3350 CPU fil-c2-512M failed 0.0 2024-02-05 01:00:42 3369 CPU fil-c2-512M failed 0.0 2024-02-05 01:30:46 3388 CPU fil-c2-512M failed 0.0 2024-02-05 02:00:40 3407 CPU fil-c2-512M failed 0.0 2024-02-05 02:30:41 3426 CPU fil-c2-512M failed 0.0 2024-02-05 03:00:39 3445 CPU fil-c2-512M failed 0.0 2024-02-05 03:30:42 3464 CPU fil-c2-512M failed 0.0 2024-02-05 04:00:40 3483 CPU fil-c2-512M failed 0.0 2024-02-05 04:30:40 3502 CPU fil-c2-512M failed 0.0 2024-02-05 05:00:51 3520 CPU fil-c2-512M failed 0.0 2024-02-05 05:30:41 3538 CPU fil-c2-512M failed 0.0 2024-02-05 06:00:45 3555 CPU fil-c2-512M failed 0.0 2024-02-05 06:30:48 3572 CPU fil-c2-512M failed 0.0 2024-02-05 07:00:41

cat cp.log | grep 3572 time="2024-02-05 07:00:41.516" level=info msg="receive ubi task received: {ID:3572 Name:1000-0-8-1968 Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.multichain.storage/ipfs/QmU8XLHUSUeppBHG7qRRRBzz9LK556bMJiqfyQuFognrAL Signature:0x4497a0598adbfdac12336f7a904c00ac7f618fd8f39130b5037aefd0ea50dc9b139e1e431cbe61c22a071e60001a19aea0c47d4026a7d1d5b7f17820c18dc05401 Resource:0xc000892c00}" func=DoUbiTask file="cp_service.go:547" time="2024-02-05 07:00:41.516" level=info msg="ubi task sign verifing, task_id: 3572, type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:587"

Normalnoise commented 5 months ago

can you provide your ubi container error log?

scvpark commented 5 months ago

Which of the docker containers is a ubi container??

Normalnoise commented 5 months ago

it should be like this

image
zsp03 commented 5 months ago

it should be like this image

I have the same issue, however i cannot view the container logs because it gets deleted too fast (BackOffLimit) I tried finding solution but it all seems to be a setting in the yaml file which only the developer can change.

I can however view the event logs and as you can see it only takes 3s to reach the limit (which deletes the pods, trying to use kubectl logs will only return blank or pods not found). And yes now the cpu is correctly detected but still cannot figure out whats wrong with the pods.

image

scvpark commented 5 months ago

The ubi-task pod disappears too quickly to check the log. Is there a way to check the logs differently? I checked with the computing-providerubi-task list --show-failed command, and about 60% of the tasks fail per day.

Normalnoise commented 2 weeks ago

please upgrade to v0.5.1