google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17.11k stars 2.32k forks source link

Extend machineInfo to include NUMA topolgy #1521

Open vikaschoudhary16 opened 8 years ago

vikaschoudhary16 commented 8 years ago

Hello folks, I am working on a PoC to enable NUMA aware workloads in a Kubernetes deployment. I feel cadvisor should be enhanced to discover NUMA topology. Something similar to like libvirt does and one can see NUMA topology using virsh capabilities. This might be useful for any higher application(w.r.t cadvisor) interested in leveraging NUMA. Thoughts?

vikaschoudhary16 commented 8 years ago

cc @jeremyeder @timstclair @ConnorDoyle @derekwaynecarr @vishh @psuriset @timothysc

ConnorDoyle commented 8 years ago

cc @balajismaniam

On Oct 28, 2016, at 20:44, Vikas Choudhary notifications@github.com wrote:

cc @jeremyeder @timstclair @ConnorDoyle @derekwaynecarr

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

derekwaynecarr commented 8 years ago

So the topology with cells and cpus listed?

Would you do this on MachineInfo?

Got a response format in mind?

On Saturday, October 29, 2016, Connor Doyle notifications@github.com wrote:

cc @balajismaniam

On Oct 28, 2016, at 20:44, Vikas Choudhary <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

cc @jeremyeder @timstclair @ConnorDoyle @derekwaynecarr

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/cadvisor/issues/1521#issuecomment-257075536, or mute the thread https://github.com/notifications/unsubscribe-auth/AF8dbPAOBINKmlRdln7Kg_bT5zKIekaDks5q4u6QgaJpZM4KkA6p .

vikaschoudhary16 commented 8 years ago

@derekwaynecarr

So the topology with cells and cpus listed?

yes. But for naming instead of 'cells', I would prefer 'numa_nodes' or something similar to make naming more intuitive.

Would you do this on MachineInfo?

Yes.

Got a response format in mind?

Does the following format sounds reasonable? :

type NumaNode struct {
        Id int `json:"numa_node_id"`
        // Per-numa-node memory
        // Representating 'free:' from 'numactl --hardware' output
        MemoryFree uint64  `json:"numa_memory_free"`
        // Representating 'size:' from 'numactl --hardware' output
        MemorySize uint64  `json:"numa_memory_size"`
        // Number of cores belonging to this numa node
        NumCores int `json:"numa_num_cores"`
        // Representating 'cpus:' from 'numactl --hardware' output
       Cores  [ ]string  `json:"cores"`
}

type MachineInfo struct {
               .............
               .......
               // Number of NUMA nodes detected on machine
               NumNumaNodes int `json:"num_numa_nodes"`
               ........
               // NUMA Topology
               // Describes cpu cores and memory available with each NUMA node
               NumaTopology   map[string]*NumaNode `json:"numa_topolgy"`
               ........
               .........
}         

Thanks -Vikas

timothysc commented 7 years ago

@vikaschoudhary16 If you have a POC in place. Recommended to make a [WIP/POC] PR here to get feedback.

vishh commented 7 years ago

+1 to exposing numa info. NUMA support in k8s should be dealt with in a k8s proposal though.

vikaschoudhary16 commented 7 years ago

@vish thanks a lot!!!