kvaps / kube-linstor

Containerized LINSTOR SDS for Kubernetes, ready for production use.
Apache License 2.0
130 stars 25 forks source link

Add code to ensure start order between controller/satellite/csi-node #33

Closed t3hmrman closed 3 years ago

t3hmrman commented 3 years ago

I found that after trying to set up kube-linstor start order (especially after restart) was causing the node to not get registered. As far as I can tell, the order must be -- controller>satellite>csi-node.

If they start in the wrong order, the storage pool (which is currently implemented as a initContainer for me, see https://github.com/kvaps/kube-linstor/pull/31) does not get registered properly. Short of building in a CRD/control loop I don't think there's much you can do about this -- I couldn't find a way to pre-configure the storage pool layout of linstor with a file (is this possible? surely it is?)

Signed-off-by: vados vados@vadosware.io

kvaps commented 3 years ago

Hi @t3hmrman, thank you for your contribution!

I suppose that we need to implement better error handling mechanism for #31 instead implementing wait-for patternб which contradicts to 12-factor app best practices, and still not handling the errors ;-)

t3hmrman commented 3 years ago

Yeah I'd agree -- I mean ideally I think going the operator route is worth doing here -- enshrining this kind of operational knowledge is what operators were built for, so that's the optimal solution, but it requires more complexity. I put this up just so anyone who tries to get started today knows what it might look like, just like #31 tipped me off to a way to make storage pools.

The benchmark I was working on is up now so I don't know when I'll get around to doing another round of testing, but I don't think this code changed very much after I figured it out. The code for that is also available.

@kvaps if you want to close this and figure out a different way to do it I'm all for it, but IMO it's worth at least documenting that this ordering issue exists. I didn't end up trying piraeus-operator so I'm not sure how well it would have handled it.

kvaps commented 3 years ago

Wow, such a nice work! I will publish it wherever possible 🙂

t3hmrman commented 3 years ago

Oh please no rush! please let me know if there's something I can do to make this PR better -- the tests are definitely not perfect. If I can find more time, I'll try and write a minimal controller you might like (don't hold me to this). I've been itching for a reason to use kube to try writing a controller :)

kvaps commented 3 years ago

@t3hmrman I already decided to keep this project such simplest as possible. However people are really needed to upload some basic configuration to linstor-controller. So it would allow you to generate the yaml maifests and just apply them into cluster without any additional logic.

Automate the node joining process and storage-pools configuration should be enough for now. Currently I preparing simple bash script for that, which will work as a sidecar, and will not affect the current workflow.

If you'd like the controller way, I think you can help @WanzenBug to develop official https://github.com/piraeusdatastore/piraeus-operator. Or start the own project to build opeator for configure linstor-controller entities via CRD. Of course if it will be good enough we can include it into this chart.

To be honest I was already thinking to use aggregation apiserver feature to just represent all the linstor resources as simple Kubernetes entities. See https://github.com/LINBIT/linstor-server/issues/81 for more details.

t3hmrman commented 3 years ago

Ah thanks for sharing, yeah that sounds good -- I'm definitely a fan of keeping it minimal! :)

kvaps commented 3 years ago

Closed in favor #35