Hard-coding `fsid`, `config` and `image` for speedups.

mnaser commented 6 months ago

root@instance:~# time cephadm shell ceph -s
Inferring fsid 3ed20d13-abf7-55f5-a31b-720e50e123fb
Inferring config /var/lib/ceph/3ed20d13-abf7-55f5-a31b-720e50e123fb/mon.instance/config
Using ceph image with id '5be31c24972a' and tag 'v18.2.1' created on 2024-02-22 16:12:19 +0000 UTC
quay.io/ceph/ceph@sha256:9f35728f6070a596500c0804814a12ab6b98e05067316dc64876fb4b28d04af3
  cluster:
    id:     3ed20d13-abf7-55f5-a31b-720e50e123fb
    health: HEALTH_WARN
            1 stray daemon(s) not managed by cephadm
            3 pool(s) have no replicas configured

  services:
    mon: 1 daemons, quorum instance (age 29m)
    mgr: instance.zsmnlu(active, since 29m)
    osd: 3 osds: 3 up (since 29m), 3 in (since 9h)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    pools:   12 pools, 185 pgs
    objects: 1.37k objects, 7.5 GiB
    usage:   7.7 GiB used, 3.0 TiB / 3.0 TiB avail
    pgs:     185 active+clean

real    0m14.520s
user    0m0.435s
sys     0m0.298s

vs

root@instance:~# time cephadm --image quay.io/ceph/ceph:v18.2.1 shell --fsid 3ed20d13-abf7-55f5-a31b-720e50e123fb --config /var/lib/ceph/3ed20d13-abf7-55f5-a31b-720e50e123fb/mon.instance/config ceph -s
  cluster:
    id:     3ed20d13-abf7-55f5-a31b-720e50e123fb
    health: HEALTH_WARN
            1 stray daemon(s) not managed by cephadm
            3 pool(s) have no replicas configured

  services:
    mon: 1 daemons, quorum instance (age 28m)
    mgr: instance.zsmnlu(active, since 28m)
    osd: 3 osds: 3 up (since 28m), 3 in (since 9h)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    pools:   12 pools, 185 pgs
    objects: 1.37k objects, 7.5 GiB
    usage:   7.7 GiB used, 3.0 TiB / 3.0 TiB avail
    pgs:     185 active+clean

  io:
    client:   1.7 KiB/s rd, 426 B/s wr, 1 op/s rd, 1 op/s wr

real    0m1.741s
user    0m0.331s
sys     0m0.158s

almost 15x faster, we gotta do this, it'll speed up the CI tremendously.

mnaser commented 6 months ago

FTR, right now it takes around 12 minutes for cephadm to deploy a cluster, potentially maybe at the speedups we see above it can go down to a few minutes (like it was before?) and the HA is taking almost 50 minutes should be cut down significnatly.

KeplerSysAdmin commented 4 months ago

Hello @mnaser X stray daemon(s) not managed by cephadm We are having this on our sandbox env, do you have any insights about it? We know the issue/bug is superficial and it looks like a bug

mnaser commented 4 months ago

Hello @mnaser

X stray daemon(s) not managed by cephadm

We are having this on our sandbox env, do you have any insights about it?

We know the issue/bug is superficial and it looks like a bug

This is not a bug but a result of how things run inside Atmosphere.

KeplerSysAdmin commented 4 months ago

Hello @mnaser X stray daemon(s) not managed by cephadm We are having this on our sandbox env, do you have any insights about it? We know the issue/bug is superficial and it looks like a bug

This is not a bug but a result of how things run inside Atmosphere.

Do you have an open issue about it? or a way to solve this?

mnaser commented 4 months ago

Hello @mnaser

X stray daemon(s) not managed by cephadm

We are having this on our sandbox env, do you have any insights about it?

We know the issue/bug is superficial and it looks like a bug

This is not a bug but a result of how things run inside Atmosphere.

Do you have an open issue about it?

or a way to solve this?

It's not a bug so there's no point in solving this. It's silenced by our deployment tooling.

Please let's keep this issue on topic.

Thanks

ricolin commented 1 month ago

We already have this issue targeted with https://github.com/vexxhost/ansible-collection-ceph/commit/32bb44d26570eedc11ea5adca09791deff312bff so let's close this issue

vexxhost / ansible-collection-ceph

Hard-coding `fsid`, `config` and `image` for speedups. #28