xcat2 / xcat-core

Code repo for xCAT core packages
Eclipse Public License 1.0
368 stars 172 forks source link

nodels is extremely slow for dynamic groups #5985

Open kcgthb opened 5 years ago

kcgthb commented 5 years ago

Hi,

We're making extensive use of dynamic groups and we're finding that listing the members of such groups with the nodels command is extremely slow.

For instance, we have a group defined like this:

# tabdump -w wherevals==rack==N1 nodegroup
#groupname,grouptype,members,membergroups,wherevals,comments,disable
"rack:N1","dynamic","dynamic",,"rack==N1","CS",

Out of our 1,509 records in the nodelist table, 11 have that attribute defined, which makes them part of the dynamic group. For instance:

# lsdef sh-15-01 -i rack
Object name: sh-15-01
    rack=N1

Listing the members of that group is extremely long:

# time nodels rack:N1
sh-15-01
sh-15-02
sh-15-03
sh-15-04
sh-15-05
sh-15-06
sh-15-07
sh-15-08
sh-15-09
sh-15-10
sh-pdu-15-01

real    0m34.193s
user    0m0.119s
sys     0m0.030s

lsdef appears substantially faster than nodels, but still takes some time:

# time lsdef -s rack:N1
sh-15-01  (node)
sh-15-02  (node)
sh-15-03  (node)
sh-15-04  (node)
sh-15-05  (node)
sh-15-06  (node)
sh-15-07  (node)
sh-15-08  (node)
sh-15-09  (node)
sh-15-10  (node)
sh-pdu-15-01  (node)

real    0m11.854s
user    0m0.216s
sys     0m0.032s

For comparison, nodels on a static group returns in hundreds of a second (0.4s for nodels compute, where compute is a static group containing over 1,300 objects).

Is there any way to optimize this and make those commands run faster?

We're using MySQL (MariaDB) as our xCAT DB backend, because we're using hierarchical mode.

robin2008 commented 5 years ago

@kcgthb what is the time to run lsdef -s -w rack==N1 or nodels nodepos.rack==N1? If it is accepted, maybe you use use conditional query first for a workaround.

nodels is implemented in tabutils.pm, while lsdef is implemented in DBobjectdefs.pm. I think now both of them are not optimized for dynamic group querying.

The only way is to deep dive the code logic and see if we can refine them to improve the performance of dynamic group querying.

Could you help to provide more information as below:

kcgthb commented 5 years ago

Hi @robin2008

Thanks for the feedback and suggestions.

Here are some more timings:

# time lsdef -s -w rack==N1 >/dev/null

real    0m11.980s
user    0m0.212s
sys     0m0.049s

# time nodels nodepos.rack==N1 >/dev/null

real    0m0.933s
user    0m0.080s
sys     0m0.026s

# time nodels rack:N1 >/dev/null

real    0m30.250s
user    0m0.108s
sys     0m0.023s

So nodels nodepos.rack==N1 is the clear winner here.

I really wonder why nodels <table values condition> takes a fraction of a second, while nodels <dynamicgroup>takes over 30 seconds, considering that <dynamicgroup> is really just defined with the same table values condition:

# lsdef -t group rack:N1 -i wherevals
Object name: rack:N1
    wherevals=rack==N1

Here's the information you requested:

what's the total nodes managed in your cluster (rows in nodelist)

# tabdump nodelist | wc -l
1509

what's size of your groups or node category (tens, hundreds or thousands )

# lsdef -s -t group | wc -l
211

how many rows defined in table rack and nodepos?

# tabdump rack | wc -l
49
# tabdump nodepos | wc -l
1030

Thanks!

robin2008 commented 5 years ago

Thanks, we will try to enhance it and add dynamic gourp querying into xcat performnace baseline testing.