Closed ThomasBlock closed 3 months ago
its also noteworthy that the tasks no longer are refundable. they are "completed" after just some minutes
@Normalnoise
still no new jobs...
Error: register cp to ubi hub failed
the error shows: you have not report your cp info to the ubi-engine, so you can not get the ubi task; you must re-init the cp account to ensure no error happen
its also noteworthy that the tasks no longer are refundable. they are "completed" after just some minutes
@ThomasBlock I have noticed this issue, it is a bug, we need more time to fix it. if you need more swan token, please let me know. the refund function will be fixed as soon as possible
Error: register cp to ubi hub failed
the error shows: you have not report your cp info to the ubi-engine, so you can not get the ubi task; you must re-init the cp account to ensure no error happen
so what should i do? i ran it again , same error
computing-provider init --ownerAddress 0xfe017Ff8F0C7349845Ab52E58FcA96143f2c4981 --beneficiaryAddress 0x269EBeee083CE6f70486a67dC8036A889bF322A9
Contract deployed! Address: 0x8A878316d185a05edF4A63E92B81737d807E8762
Transaction hash: 0x33d1550790fe4f6c2f4aae2f61c77ef4148690d173a477f63675a8e02957cd8a
Error: register cp to ubi hub failed
it really seems liek there is a general problem with the hub, as other discord members report. can you check that?
you can try it again:
$CP_PATH/privateKey
you can try it again:
- delete the
$CP_PATH/privateKey
- re-init computing-provider account to ensure there is no error.
i tried that now 4 times. everytime i see
Error: register cp to ubi hub failed
Is there anything else which needs to be done after the Collateral change? i wrote in my config file
SWAN_COLLATERAL_CONTRACT="0xdc200f89258e72aC3602dD23BD3642C4bd4eE34e"
but the collateral differes between the hub and my software - so where could be the problem here?
Ha..2 hours ago something changed in the network: my tasks jumped from 10 to 110. Collateral still 0.95000, but i guess something good happened on the network!
seems to be on whole network:
on the other hand this is bas all the cpu cores are now blocked and so we still cannot lease GPUs..
we have fixed some issues, can you try it again to lease your GPU
we have fixed some issues, can you try it again to lease your GPU
thank you for the update. yes something changed. the spam taks disappeared recently. i could start the task on langrange. now there are new errors. seen this for two different deployments. did work earlier.
Error building Docker image: Error response from daemon: invalid reference format Failed to extract exposed port: unable to open Dockerfile: open : no such file or director
time="2024-02-28 18:35:04.312" level=info msg="Job received Data: {UUID:cba67825-22b6-4b98-9aee-c70a50662303 Name:Job-cba67825-22b6-4b98-9aee-c70a50662303 Status:Submitted Duration:3600 JobSourceURI:https://api.lagrangedao.org/spaces/201d7532-8284-4ebd-b348-ff5ce862beda JobResultURI: StorageSource:lagrange TaskUUID:12e1f0a6-aece-4f4a-a3bc-da2e66ffc782 CreatedAt:1709141704 UpdatedAt:1709141704 BuildLog: ContainerLog:}" func=ReceiveJob file="cp_service.go:79"
time="2024-02-28 18:35:05.894" level=info msg="checkResourceAvailableForSpace: needCpu: 8, needMemory: 16.00, needStorage: 20.00" func=checkResourceAvailableForSpace file="cp_service.go:1210"
time="2024-02-28 18:35:05.894" level=info msg="checkResourceAvailableForSpace: remainingCpu: 2, remainingMemory: 16.00, remainingStorage: 348.00" func=checkResourceAvailableForSpace file="cp_service.go:1211"
time="2024-02-28 18:35:05.894" level=info msg="checkResourceAvailableForSpace: needCpu: 8, needMemory: 16.00, needStorage: 20.00" func=checkResourceAvailableForSpace file="cp_service.go:1210"
time="2024-02-28 18:35:05.894" level=info msg="checkResourceAvailableForSpace: remainingCpu: 10, remainingMemory: 53.00, remainingStorage: 1568.00" func=checkResourceAvailableForSpace file="cp_service.go:1211"
time="2024-02-28 18:35:05.894" level=info msg="gpuName: NVIDIA-4090, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan2:map[NVIDIA-A4000:1] swan3:map[NVIDIA-4090:1] swan7:map[NVIDIA-3090:1] swan8:map[NVIDIA-A6000:1]]" func=checkResourceAvailableForSpace file="cp_service.go:1217"
time="2024-02-28 18:35:05.894" level=info msg="checkResourceAvailableForSpace: needCpu: 8, needMemory: 16.00, needStorage: 20.00" func=checkResourceAvailableForSpace file="cp_service.go:1210"
time="2024-02-28 18:35:05.894" level=info msg="checkResourceAvailableForSpace: remainingCpu: 23, remainingMemory: 65.00, remainingStorage: 1593.00" func=checkResourceAvailableForSpace file="cp_service.go:1211"
time="2024-02-28 18:35:05.894" level=info msg="gpuName: NVIDIA-4090, nodeGpu: map[:0 kubernetes.io/os:0], nodeGpuSummary: map[swan2:map[NVIDIA-A4000:1] swan3:map[NVIDIA-4090:1] swan7:map[NVIDIA-3090:1] swan8:map[NVIDIA-A6000:1]]" func=checkResourceAvailableForSpace file="cp_service.go:1217"
time="2024-02-28 18:35:05.895" level=info msg="submitting job..." func=submitJob file="cp_service.go:124"
time="2024-02-28 18:35:05.895" level=info msg="uploading file to bucket, objectName: jobs/c094542f-3962-4633-94be-2963439c8165.json, filePath: /tmp/jobs/c094542f-3962-4633-94be-2963439c8165.json" func=UploadFileToBucket file="storage_service.go:52"
time="2024-02-28 18:35:06.808" level=info msg="uuid: 201d7532-8284-4ebd-b348-ff5ce862beda, spaceName: myDiffusion, hardwareName: Nvidia 4090 · 8 vCPU · 16 GiB" func=DeploySpaceTask file="cp_service.go:1019"
time="2024-02-28 18:35:07.013" level=error msg="http status: 400 Bad Request, code:400, url:https://api.multichain.storage/api/v2/oss_file/get_file_by_object_name?bucket_uid=878494a8-6ab7-4694-96ac-fc89a2afcbe1&object_name=jobs/c094542f-3962-4633-94be-2963439c8165.json" func=HttpRequest file="restful.go:127"
time="2024-02-28 18:35:07.013" level=error msg="https://api.multichain.storage/api/v2/oss_file/get_file_by_object_name?bucket_uid=878494a8-6ab7-4694-96ac-fc89a2afcbe1&object_name=jobs/c094542f-3962-4633-94be-2963439c8165.json failed, status:error, message:invalid param value:record not found" func=HttpRequest file="restful.go:154"
time="2024-02-28 18:35:07.013" level=error msg="https://api.multichain.storage/api/v2/oss_file/get_file_by_object_name?bucket_uid=878494a8-6ab7-4694-96ac-fc89a2afcbe1&object_name=jobs/c094542f-3962-4633-94be-2963439c8165.json failed, status:error, message:invalid param value:record not found" func=HttpGet file="restful.go:64"
time="2024-02-28 18:35:07.013" level=error msg="https://api.multichain.storage/api/v2/oss_file/get_file_by_object_name?bucket_uid=878494a8-6ab7-4694-96ac-fc89a2afcbe1&object_name=jobs/c094542f-3962-4633-94be-2963439c8165.json failed, status:error, message:invalid param value:record not found" func=GetFile file="file.go:56"
time="2024-02-28 18:35:07.167" level=info msg="Download 201d7532-8284-4ebd-b348-ff5ce862beda successfully." func=BuildSpaceTaskImage file="buildspace.go:33"
time="2024-02-28 18:35:07.278" level=info msg="Download 201d7532-8284-4ebd-b348-ff5ce862beda successfully." func=BuildSpaceTaskImage file="buildspace.go:33"
time="2024-02-28 18:35:07.391" level=info msg="Download 201d7532-8284-4ebd-b348-ff5ce862beda successfully." func=BuildSpaceTaskImage file="buildspace.go:33"
2024/02/28 18:35:07 Image path: build/0x7B0CEe1939a4AdA062EC79f4862a42C1F47B1806/spaces/myDiffusion
time="2024-02-28 18:35:07.392" level=error msg="Error building Docker image: Error response from daemon: invalid reference format" func=BuildImagesByDockerfile file="buildspace.go:80"
time="2024-02-28 18:35:07.392" level=info msg="Failed to extract exposed port: unable to open Dockerfile: open : no such file or directory" func=DockerfileToK8s file="deploy.go:91"
time="2024-02-28 18:35:08.303" level=info msg="file name:1_c094542f-3962-4633-94be-2963439c8165.json, chunk size:712" func=func1 file="file.go:217"
time="2024-02-28 18:35:09.313" level=info msg="Delete redis keys finished, keys: [FULL:201d7532-8284-4ebd-b348-ff5ce862beda]" func=1 file="task_service.go:286"
time="2024-02-28 18:35:10.480" level=info msg="jobuuid: cba67825-22b6-4b98-9aee-c70a50662303 successfully submitted to IPFS" func=submitJob file="cp_service.go:152"
time="2024-02-28 18:35:10.743" level=info msg="submit job detail: {UUID:cba67825-22b6-4b98-9aee-c70a50662303 Name:Job-cba67825-22b6-4b98-9aee-c70a50662303 Status:submitted Duration:3600 JobSourceURI:https://api.lagrangedao.org/spaces/201d7532-8284-4ebd-b348-ff5ce862beda JobResultURI:https://7d67303d2964.acl.multichain.storage/ipfs/QmWuN7LhTa2Bw6Fg3jwatUZVjd3D1a42haJAuSzMu5FVeK StorageSource:lagrange TaskUUID:12e1f0a6-aece-4f4a-a3bc-da2e66ffc782 CreatedAt:1709141704 UpdatedAt:1709141705 BuildLog:wss://log.bitstakehaven.com:8085/api/v1/computing/lagrange/spaces/log?space_id=201d7532-8284-4ebd-b348-ff5ce862beda&type=build ContainerLog:wss://log.bitstakehaven.com:8085/api/v1/computing/lagrange/spaces/log?space_id=201d7532-8284-4ebd-b348-ff5ce862beda&type=container}" func=ReceiveJob file="cp_service.go:119"
[GIN] 2024/02/28 - 18:35:10 | 200 | 6.431254455s | 38.104.153.43 | POST "/api/v1/computing/lagrange/jobs"
minesweeper on the other hand will deploy, but then crash in kubernetes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m5s default-scheduler Successfully assigned ns-0x7b0cee1939a4ada062ec79f4862a42c1f47b1806/deploy-5d9dc1e2-877b-4e74-8504-03bf195e1af0-6b4dd877cc-kfdmp to swan3
Normal Pulled 29s (x5 over 2m6s) kubelet Container image "creepto/minesweeper" already present on machine
Normal Created 29s (x5 over 2m6s) kubelet Created container 5d9dc1e2-877b-4e74-8504-03bf195e1af0-minesweeper
Warning Failed 28s (x5 over 2m6s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown
Warning BackOff 15s (x10 over 2m4s) kubelet Back-off restarting failed container 5d9dc1e2-877b-4e74-8504-03bf195e1af0-minesweeper in pod deploy-5d9dc1e2-877b-4e74-8504-03bf195e1af0-6b4dd877cc-kfdmp_ns-0x7b0cee1939a4ada062ec79f4862a42c1f47b1806(0896aa90-c73a-4ba9-bc31-cd825a117ebc)
is there anything i need to do on my side? like updating components etc?
because our storage service is maintaining, the Lagrange and space deployment is not available, but the ubi task can be done normally.
once the maintenance completed, there will be a announcement in the community in the discord
please upgrade to https://github.com/swanchain/go-computing-provider/releases/tag/v0.4.5
Today the networks looks very idle
until yesterday i could assign jobs to my compute-provider via lagrange.. but today it does no longer work.. is there a general problem with lagrange?
i changed the variable as requested ( now the Collateral(SWAN-ETH)is quite static and oes no longer change )
SWAN_COLLATERAL_CONTRACT="0xdc200f89258e72aC3602dD23BD3642C4bd4eE34e"
Ubi tasks and connectivity are good
i have seen this error in info
NodeId mismatch, local node id: 044dd69a713349917f376a77ba832accaa60422f5d70484f2c82d62a87b42c81d1eab1113cfa188e350df03e8cf734a67d14611953f9b92353911b68e45bc27d3d, chain node id: .
so i redeployed. but also error message here
now the info seems good. but still no new jobs