Closed wanghui-devops closed 4 months ago
The PG count is very low. Try increasing the number of PGs in the pools and check if the autoscaler needs to be enabled.
Yes, the root cause is that pg num and pgp num have values of 1, resulting in only storing data to 3 osds. But what I don't understand is that this pool has PG Autoscale turned on all the time. When does it not take effect? I fixed this problem by manually setting pg num and pgp num and turning on balance;
ceph osd pool set mypool pg_num 128
ceph osd pool set mypool pgp_num 128
ceph balancer mode upmap
ceph balancer on
I have observed this pg_num/pgb_num 1
but initially it is assigned 1
but later it changed to at least 32
.
Can you verify this?
ceph osd pool ls detail
pool 2 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 20 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 5 'replicapool' replicated size 3 min_size 2 crush_rule 4 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 58 lfor 0/0/56 flags hashpspool,selfmanaged_snaps stripe_width 0 compression_mode none application rbd
pool 6 'replicapool1' replicated size 1 min_size 1 crush_rule 5 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 69 lfor 0/0/67 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 7 'replicapool2' replicated size 1 min_size 1 crush_rule 6 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 76 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
If you see above replicapool2
is pg/pgp_num is 1
here is the mgr logs
debug 2024-02-22T10:49:58.219+0000 7f80f7fd7640 0 [pg_autoscaler INFO root] Pool 'replicapool2' root_id -1 using 1.152026622245709e-11 of space, bias 1.0, pg target 8.640199666842818e-10 quantized to 32 (current 1)
But later I got the right num
ceph osd pool ls detail
pool 2 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 20 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 5 'replicapool' replicated size 3 min_size 2 crush_rule 4 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 58 lfor 0/0/56 flags hashpspool,selfmanaged_snaps stripe_width 0 compression_mode none application rbd
pool 6 'replicapool1' replicated size 1 min_size 1 crush_rule 5 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 69 lfor 0/0/67 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 7 'replicapool2' replicated size 1 min_size 1 crush_rule 6 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 81 lfor 0/0/79 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
mgr logs
debug 2024-02-22T10:51:58.294+0000 7f80f7fd7640 0 [pg_autoscaler INFO root] Pool 'replicapool2' root_id -1 using 1.152026622245709e-11 of space, bias 1.0, pg target 8.640199666842818e-10 quantized to 32 (current 32)
@wanghui-devops
I see from the log that it should be caused by overlapping root , how to fix it?
debug 2024-02-27T03:05:47.174+0000 7f7e47b87700 0 [pg_autoscaler ERROR root] pool 20 has overlapping roots: {-12, -1, -2}
debug 2024-02-27T03:05:47.178+0000 7f7e47b87700 0 [pg_autoscaler WARNING root] pool 20 contains an overlapping root -12... skipping scaling
@subhamkrai
this is my clush-rule . Can you see what caused the problem?
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host k8s-rook2 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
id -9 class ssd # do not change unnecessarily
# weight 45.91138
alg straw2
hash 0 # rjenkins1
item osd.1 weight 43.91138
item osd.13 weight 1.00000
item osd.10 weight 1.00000
}
host k8s-rook3 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
id -10 class ssd # do not change unnecessarily
# weight 45.91138
alg straw2
hash 0 # rjenkins1
item osd.2 weight 43.91138
item osd.11 weight 1.00000
item osd.14 weight 1.00000
}
host k8s-rook1 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
id -11 class ssd # do not change unnecessarily
# weight 45.91138
alg straw2
hash 0 # rjenkins1
item osd.0 weight 43.91138
item osd.12 weight 1.00000
item osd.9 weight 1.00000
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
id -12 class ssd # do not change unnecessarily
# weight 137.73413
alg straw2
hash 0 # rjenkins1
item k8s-rook2 weight 45.91138
item k8s-rook3 weight 45.91138
item k8s-rook1 weight 45.91138
}
# rules
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule replicated-metadata-pool-middle-server {
id 1
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ec-data-pool-middle-server {
id 2
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule myfs-metadata {
id 3
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule myfs-replicated {
id 4
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule replicated-metadata-pool-kubesphere-system {
id 5
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ec-data-pool-kubesphere-system {
id 6
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule replicated-metadata-pool-nacos-system {
id 7
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ec-data-pool-nacos-system {
id 8
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule replicated-metadata-pool-middle-system {
id 9
type replicated
step take default
step chooseleaf firstn 0 type osd
step emit
}
rule ec-data-pool-middle-system {
id 10
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule replicated-metadata-pool-middle-core {
id 11
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ec-data-pool-middle-core {
id 12
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule ssd-rule {
id 13
type replicated
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
rule hdd-rule {
id 14
type replicated
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
rule ssd-replicated-metadata-pool-middle-core {
id 15
type replicated
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
rule ssd-ec-data-pool-middle-core {
id 16
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class ssd
step chooseleaf indep 0 type osd
step emit
}
rule replicated-metadata-pool-sso-system {
id 17
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ec-data-pool-sso-system {
id 18
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
# end crush map
@subhamkrai
I've fixed the overlapping root problem because at least one pool is still assigned "replicated_rule", which doesn't distinguish device classes from OSD. After I modified crush_rule, pg_autoscale works; @subhamkrai
great to know @wanghui-devops. Are we good to close the issue?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
i have 6 ssd osd , but just used 3
osd.12 , osd.10 ,osd.11 hardly used ;
rule :
One more question: Why is the pool only 3.5T?