swanchain / go-computing-provider

A golang implementation of computing provider
MIT License
11 stars 15 forks source link

reserve resources for ubi? #41

Closed ThomasBlock closed 1 month ago

ThomasBlock commented 3 months ago

Hi @Normalnoise . My cp cluster is running smoothly.

my problem now: i get too many tasks. they completely fill up my cpu cores. so when ubi-task wants to start, it can not.

Running deployments: 56

So what should we do now? i can manually kill processes every day.. but this is also bad my my score i guess? is there a way that we can reserve resources? so that there are less normal tasks, so that ubi can take place. or can we completely disable normal tasks?

[GIN] 2024/04/03 - 11:24:19 | 200 |  984.144653ms |   38.104.153.43 | GET      "/api/v1/computing/cp"
time="2024-04-03 11:24:19.273" level=info msg="receive ubi task received: {ID:71252 Name:1000-0-7-56565 Type:1 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.swanipfs.com/ipfs/QmZP6Uu5Gvz32XzLQGSRngyiQZhPYspy8hzBREZimqZisS Signature:0x4bba84c140e4fa76ef85f5dfd3358e0da8cfa016595dc685dd2fab77643ab96939e0cbf46386c1c75b08a6cb265883d5973b0c87ec5a3a294536e47e737875b200 Resource:0xc0002d4080}" func=DoUbiTask file="cp_service.go:579"
time="2024-04-03 11:24:19.273" level=info msg="ubi task sign verifing, task_id: 71252, type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:619"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1317"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: remainingCpu: -3, remainingMemory: 7.00, remainingStorage: 338.00" func=checkResourceAvailableForUbi file="cp_service.go:1318"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1317"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: remainingCpu: 1, remainingMemory: 43.00, remainingStorage: 1553.00" func=checkResourceAvailableForUbi file="cp_service.go:1318"
time="2024-04-03 11:24:19.805" level=info msg="gpuName: NVIDIA-3090, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan1:map[] swan3:map[NVIDIA-4090:1] swan5:map[] swan6:map[] swan7:map[NVIDIA-3090:1]]" func=checkResourceAvailableForUbi file="cp_service.go:1329"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1317"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: remainingCpu: 1, remainingMemory: 62.00, remainingStorage: 1503.00" func=checkResourceAvailableForUbi file="cp_service.go:1318"
time="2024-04-03 11:24:19.805" level=info msg="gpuName: NVIDIA-3090, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan1:map[] swan3:map[NVIDIA-4090:1] swan5:map[] swan6:map[] swan7:map[NVIDIA-3090:1]]" func=checkResourceAvailableForUbi file="cp_service.go:1329"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1317"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: remainingCpu: 0, remainingMemory: 62.00, remainingStorage: 967.00" func=checkResourceAvailableForUbi file="cp_service.go:1318"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1317"
time="2024-04-03 11:24:19.805" level=info msg="checkResourceAvailableForUbi: remainingCpu: 1, remainingMemory: 39.00, remainingStorage: 465.00" func=checkResourceAvailableForUbi file="cp_service.go:1318"
time="2024-04-03 11:24:19.805" level=info msg="gpuName: NVIDIA-3090, nodeGpu: map[:0 NVIDIA-3090:1 kubernetes.io/os:0], nodeGpuSummary: map[swan1:map[] swan3:map[NVIDIA-4090:1] swan5:map[] swan6:map[] swan7:map[NVIDIA-3090:1]]" func=checkResourceAvailableForUbi file="cp_service.go:1329"
time="2024-04-03 11:24:19.805" level=warning msg="ubi task id: 71252, type: GPU, not found a resources available" func=DoUbiTask file="cp_service.go:660"
[GIN] 2024/04/03 - 11:24:19 | 500 |  532.511632ms |   38.104.153.43 | POST     "/api/v1/computing/cp/ubi"

here is my manual kill procedure..

kubectl get po -A -o wide | grep swan3
ns-0x2b213c3ae98ad00a8cf0e0e5c563edbcbbcf5603   deploy-de9438be-8548-4379-82b4-0ed6e872583f-7d6c77d55f-qdjdc

computing-provider task list -v | grep -i 0x2b213c3ae98ad00a8cf0e0e5c563edbcbbcf5603
3b61df83-1563-42a5-bf9b-07c9f6cd3a0f    CPU         0x2B213c3Ae98Ad00A8CF0e0E5c563EdbCBBCf5603  de9438be-8548-4379-82b4-0ed6e872583f    mayuii

computing-provider task delete de9438be-8548-4379-82b4-0ed6e872583f

it helps for around 8 hours..

image

successful tasks take around 100 seconds..

[GIN] 2024/04/03 - 10:54:17 | 200 |  317.947049ms |   38.104.153.43 | GET      "/api/v1/computing/cp"
time="2024-04-03 10:54:17.713" level=info msg="receive ubi task received: {ID:70752 Name:1000-0-7-56065 Type:0 ZkType:fil-c2-512M InputParam:https://286cb2c989.acl.swanipfs.
com/ipfs/QmcopbCqYipYD17eoQ77EZEnuF9GQQYYVDqSRgA8XHTeNG Signature:0x35fab6788464c605937daa873d6787856b38b7c1b0c4e743b91d364aadcb5b457d2ca311fa51f703140d7f12f0a2c0971fe58a000
995741295b2a47f0c63936101 Resource:0xc001c532c0}" func=DoUbiTask file="cp_service.go:579"
time="2024-04-03 10:54:17.713" level=info msg="ubi task sign verifing, task_id: 70752, type: fil-c2-512M, verify: true" func=DoUbiTask file="cp_service.go:619"
time="2024-04-03 10:54:17.755" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1317"
time="2024-04-03 10:54:17.755" level=info msg="checkResourceAvailableForUbi: remainingCpu: -3, remainingMemory: 7.00, remainingStorage: 338.00" func=checkResourceAvailableForUbi file="cp_service.go:1318"
time="2024-04-03 10:54:17.755" level=info msg="checkResourceAvailableForUbi: needCpu: 1, needMemory: 5.00, needStorage: 1.00" func=checkResourceAvailableForUbi file="cp_service.go:1317"
time="2024-04-03 10:54:17.755" level=info msg="checkResourceAvailableForUbi: remainingCpu: 1, remainingMemory: 39.00, remainingStorage: 1533.00" func=checkResourceAvailableForUbi file="cp_service.go:1318"
[GIN] 2024/04/03 - 10:54:17 | 200 |   41.607674ms |   38.104.153.43 | POST     "/api/v1/computing/cp/ubi"
time="2024-04-03 10:55:57.244" level=info msg="task_id: 70752, C2 proof out received: {TaskId:70752 TaskType:0 Proof:hQcx5pXphv/Y5K1cjxOSpMGnuuSFen71XSx2FjyWvkCwb+ROLJatZmSKV9PAFsqtl2icmok/knA+XzkAiDiEMizxivaBKhq6sTAVCwQKKlgKz3pD20OEXHMO+KXs9kTIBLeKnwx8y8KzZCZSySaCPOlmZmsLLNcgpnfvm18sP+hC4BEAn7S6mAOrNqqOjRAmsyfZmcNRu2nz6lkKn2TqjeSaLgcBJ2UHnV6/NdTN3OQJXK32XdhOaRN6vtq+DIZE ZkType:fil-c2-512M NameSpace:ubi-task-70752}" func=ReceiveUbiProof file="cp_service.go:900"
submitUBIProofTx: 0x8a05960b8c3d22fbf9b547d27c6f91ad82fe6283eb4e475809926538ad8dfb60[GIN] 2024/04/03 - 10:55:58 | 200 |  1.008425723s |  192.168.128.73 | POST     "/api/v1/computing/cp/receive/ubi"

PS: I dont see onchain rewards for the last days of ubi.. can you check? e.g. 0xf6b7c725fb7e5a56c751eff9eb21af43413bb63d5e358d4bd897214f0632189e 0xece4b8fdb20eb0974925db41a1bf24c3d99f2013b9dd0d64b2408279bdf52511

PS: docker issues still open..

ThomasBlock commented 3 months ago

oh i just saw in the image above, that i did CPU tasks instead of GPU tasks. what yould be the problem here?

Normalnoise commented 3 months ago

Currently, the ubi task will be released every half hour, I have not seen you get too many task.

ThomasBlock commented 3 months ago

Currently, the ubi task will be released every half hour, I have not seen you get too many task.

i get so many lagrange "normal tasks" that my cpu is blocked, so i can not do ubi tasks.

Normalnoise commented 2 months ago

we will release a new version to allow CP get the ubi-task only