lagrangedao / go-computing-provider

A golang implementation of computing provider
MIT License
2 stars 35 forks source link

feat: report CP's resource usage #8

Closed flin-nbai closed 1 year ago

flin-nbai commented 1 year ago

We're creating a dashboard that will list all providers and information about them. Some new metrics we need for each provider are

To get this information in semi-realtime, the CP should have an endpoint that the server can request to get the latest resource usage info from.

If possible, try to report usage only for task (CPU / memory / storage being used by only task, not overall system), but if this is too tricky then overall info is fine.

sonic-chain commented 1 year ago

To report CP's resource, the cp should call the server's API; the server should not use the cp's api to retrieve resources. the api example request arguments are as follows:

{
    "node_id":"04f532cea9ad16d450e9e2d3f94694aa3549c18c36fa00121ed70a4ab40d64f912d751584a0614d025982d8e00191d844d0371426d7304cc63d1f5abca288480fc",
    "region":"US-VA",
    "cluster_info":[
        {
            "machine_id":"ddee71c469ed4876bcb40f92b0e48a60",
            "cpu":{
                "model":"",
                "total_nums":192,
                "available_nums":192
            },
            "memory":{
                "total_memory":2151473061888,
                "available_memory":2151473061888
            },
            "gpu":[
                {
                    "model":"",
                    "total_nums":0,
                    "available_nums":0,
                    "total_memory":0,
                    "available_memory":0
                }
            ],
            "storage":{
                "type":"",
                "total_size":0,
                "available_size":0
            }
        },
        {
            "machine_id":"1421c9f90e414825856f936fa5bbf649",
            "cpu":{
                "model":"AMD",
                "total_nums":192,
                "available_nums":174
            },
            "memory":{
                "total_memory":2151473061888,
                "available_memory":2140926484480
            },
            "gpu":[
                {
                    "model":"NVIDIA-GeForce-RTX-3080",
                    "total_nums":1,
                    "available_nums":0,
                    "total_memory":10018,
                    "available_memory":0
                },
                {
                    "model":"NVIDIA-GeForce-RTX-3090",
                    "total_nums":1,
                    "available_nums":0,
                    "total_memory":10018,
                    "available_memory":0
                }
            ],
            "storage":{
                "type":"",
                "total_size":0,
                "available_size":0
            }
        }
    ]
}
flin-nbai commented 1 year ago

To report CP's resource, the cp should call the server's API; the server should not use the cp's api to retrieve resources. the api example request arguments are as follows:

{
    "node_id":"04f532cea9ad16d450e9e2d3f94694aa3549c18c36fa00121ed70a4ab40d64f912d751584a0614d025982d8e00191d844d0371426d7304cc63d1f5abca288480fc",
    "region":"US-VA",
    "cluster_info":[
        {
            "machine_id":"ddee71c469ed4876bcb40f92b0e48a60",
            "cpu":{
                "model":"",
                "total_nums":192,
                "available_nums":192
            },
            "memory":{
                "total_memory":2151473061888,
                "available_memory":2151473061888
            },
            "gpu":[
                {
                    "model":"",
                    "total_nums":0,
                    "available_nums":0,
                    "total_memory":0,
                    "available_memory":0
                }
            ],
            "storage":{
                "type":"",
                "total_size":0,
                "available_size":0
            }
        },
        {
            "machine_id":"1421c9f90e414825856f936fa5bbf649",
            "cpu":{
                "model":"AMD",
                "total_nums":192,
                "available_nums":174
            },
            "memory":{
                "total_memory":2151473061888,
                "available_memory":2140926484480
            },
            "gpu":[
                {
                    "model":"NVIDIA-GeForce-RTX-3080",
                    "total_nums":1,
                    "available_nums":0,
                    "total_memory":10018,
                    "available_memory":0
                },
                {
                    "model":"NVIDIA-GeForce-RTX-3090",
                    "total_nums":1,
                    "available_nums":0,
                    "total_memory":10018,
                    "available_memory":0
                }
            ],
            "storage":{
                "type":"",
                "total_size":0,
                "available_size":0
            }
        }
    ]
}

Gotcha. This issue is not needed then