sepich / ceph-swarm

CephFS as Swarm stack
62 stars 14 forks source link

Single-host setup #2

Open scnerd opened 6 years ago

scnerd commented 6 years ago

I'm trying to use this in a very minimal setup with just one or two physical hosts. I can set up the configs and secrets, but once the stack is launched, the ceph cluster doesn't seem to initialize.

Here's my ceph.conf:

[global]
fsid = 227bf44e-87de-4820-b0c6-89c1b76ee3b0
mon initial members = node1_hostname
mon host = node1_hostname
public network = 192.168.0.0/24
osd journal size = 100
log file = /dev/null
mon cluster log file = /var/lib/ceph/mon/$cluster-$id/$channel.log

When I open the monitor, ceph -s hangs, and I eventually get "error connection to cluster", errno 110. I can see that the docker swarm successfully deployed 2 mds nodes, 1 mgr node, 1 mon node, and 1 osd (I get the same behavior when I hook up a second computer and have 2 osd nodes).

Is this setup too minimal? Will the cluster fail to initialize unless I have at least 3 mon's or 3 osd's? Is it possibly because I gave only a local hostname for "mon host" instead of a full domain name (I don't run my own DNS, so I'm not sure what fully qualified domain name I'd use)?

djbingham commented 6 years ago

I've had exactly the same issue. I've spent all of today battling with it and got nowhere but I have noticed this towards the end of the logs for the monitor container:

 mon.development-01 does not exist in monmap, will attempt to join an existing cluster

That seems a bit odd. As I'm only starting a single monitor, I would expect it not to try to join an existing cluster but to create a new cluster. Perhaps this is the reason ceph -s hangs - the ceph monitor hasn't started because it's trying to join a cluster that hasn't been created?

I can run a single ceph monitor in isolation using docker run as per the beginning of http://www.sebastien-han.fr/blog/2015/06/23/bootstrap-your-ceph-cluster-in-docker. I haven't tried adding the rest of the stack yet but in any case, I really want everything working as a single swarm stack with shared configs, etc.

scnerd commented 6 years ago

Not sure if it's relevant, but the MDS servers seem to be rebooting. Their only logs are:

ceph_mds.2.zy70i20ggnf2@maxrack01    | Reading Swarm secrets
ceph_mds.2.zy70i20ggnf2@maxrack01    | 2018-06-18 16:58:02  /entrypoint.sh: static: does not generate config

After a little while, it crashes silently and gets re-launched. Could that be causing the rest of the issues?