piraeusdatastore / piraeus

High Available Datastore for Kubernetes
https://piraeus.io/
Apache License 2.0
438 stars 50 forks source link

PVC in unused when loosing a node of two. #191

Open S3LL1G28 opened 2 weeks ago

S3LL1G28 commented 2 weeks ago

Hi, I have 3 controlplanes and two workers.

I used a placement of two for replication.

If I cordon and drain a node, and then uncordon my node I have all my pvc in unused state.

an abstract:

kubectl -n piraeus exec -it deployment/linstor-controller -- linstor volume list ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊ ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ k8sw1.gil-yan.net ┊ pvc-1f60b906-60b4-418d-bbc6-d55c8edbd585 ┊ lvm-thin ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊ 3.94 GiB ┊ Unused ┊ UpToDate ┊ ┊ k8sw2.gil-yan.net ┊ pvc-1f60b906-60b4-418d-bbc6-d55c8edbd585 ┊ lvm-thin ┊ 0 ┊ 1001 ┊ /dev/drbd1001 ┊ 3.94 GiB ┊ Unused ┊ UpToDate ┊ ┊ k8sw1.gil-yan.net ┊ pvc-430f8dc8-aca4-4b3e-8b5c-7a90bf281123 ┊ lvm-thin ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊ 173.70 MiB ┊ Unused ┊ UpToDate ┊ ┊ k8sw2.gil-yan.net ┊ pvc-430f8dc8-aca4-4b3e-8b5c-7a90bf281123 ┊ lvm-thin ┊ 0 ┊ 1007 ┊ /dev/drbd1007 ┊ 218.28 MiB ┊

kubectl -n piraeus exec -it deployment/linstor-controller -- drbdsetup status pvc-1f60b906-60b4-418d-bbc6-d55c8edbd585 role:Secondary disk:UpToDate k8sw1.gil-yan.net connection:StandAlone

pvc-430f8dc8-aca4-4b3e-8b5c-7a90bf281123 role:Secondary disk:UpToDate k8sw1.gil-yan.net connection:StandAlone

pvc-43ab85c9-ff82-4352-821a-2139bba8ab8c role:Secondary disk:UpToDate k8sw1.gil-yan.net connection:StandAlone

pvc-49a29ef4-eb2e-421c-866c-12201ed245e9 role:Secondary

I'm sorry for this noob question, but how to correct this 'issue' ?

thx.

S3LL1G28 commented 2 weeks ago

Just one more thing, if I delete a pod, and then it's scheduled, it can use for a time the pvc and then the controller detached the volume.

Started PV processing "pvc-5ba1ee30-a79a-4e08-be5e-f832bf25b3a1" │ │ csi-attacher I0830 08:35:24.264892 1 csi_handler.go:623] CSIHandler: processing PV "pvc-5ba1ee30-a79a-4e08-be5e-f832bf25b3a1" │ │ csi-attacher I0830 08:35:24.264896 1 util.go:106] Finalizer removed from "csi-3f18a26eb10abb70922f8ef5c5dc26f05380e055cbd3b21a4a13e524304bc068" │ │ csi-attacher I0830 08:35:24.264903 1 csi_handler.go:645] CSIHandler: processing PV "pvc-5ba1ee30-a79a-4e08-be5e-f832bf25b3a1": no deletion timestamp, ignoring │ │ csi-attacher I0830 08:35:24.264910 1 csi_handler.go:295] Fully detached "csi-3f18a26eb10abb70922f8ef5c5dc26f05380e055cbd3b21a4a13e524304bc068" │ │ csi-attacher I0830 08:35:24.264917 1 csi_handler.go:240] CSIHandler: finished processing "csi-3f18a26eb10abb70922f8ef5c5dc26f05380e055cbd3b21a4a13e524304bc068" │ │ csi-attacher I0830 08:35:24.264930 1 controller.go:210] Started VA processing "csi-3f18a26eb10abb70922f8ef5c5dc26f05380e055cbd3b21a4a13e524304bc068" │ │ csi-attacher I0830 08:35:24.264941 1 controller.go:217] VA "csi-3f18a26eb10abb70922f8ef5c5dc26f05380e055cbd3b21a4a13e524304bc068" deleted, ignoring │ │ linstor-csi time="2024-08-30T08:35:24Z" level=info msg="resource not temporary (not created by Attach) not deleting" linstorCSIComponent=client targetNode=k8sw2.gil-yan.net volume=pvc-848c5e94-13cb-4ec8-aa5a-c │ │ csi-attacher I0830 08:35:24.366769 1 connection.go:200] GRPC response: {} │ │ csi-attacher I0830 08:35:24.366861 1 connection.go:201] GRPC error: <nil> │ │ csi-attacher I0830 08:35:24.366936 1 csi_handler.go:581] Detached "csi-6b6fa604013a2ea34f4bd414ad7ba60b1faa4bbccacce30b12cb2f5d49db3db2" │ │ csi-attacher I0830 08:35:24.367039 1 util.go:80] Marking as detached "csi-6b6fa604013a2ea34f4bd414ad7ba60b1faa4bbccacce30b12cb2f5d49db3db2" │ │ csi-attacher I0830 08:35:24.404487 1 controller.go:261] Started PV processing "pvc-848c5e94-13cb-4ec8-aa5a-cb8789235c6c" │ │ csi-attacher I0830 08:35:24.404510 1 csi_handler.go:623] CSIHandler: processing PV "pvc-848c5e94-13cb-4ec8-aa5a-cb8789235c6c" │ │ csi-attacher I0830 08:35:24.404519 1 csi_handler.go:645] CSIHandler: processing PV "pvc-848c5e94-13cb-4ec8-aa5a-cb8789235c6c": no deletion timestamp, ignoring │ │ csi-attacher I0830 08:35:24.405196 1 util.go:106] Finalizer removed from "csi-6b6fa604013a2ea34f4bd414ad7ba60b1faa4bbccacce30b12cb2f5d49db3db2" │ │ csi-attacher I0830 08:35:24.405312 1 csi_handler.go:295] Fully detached "csi-6b6fa604013a2ea34f4bd414ad7ba60b1faa4bbccacce30b12cb2f5d49db3db2" │ │ csi-attacher I0830 08:35:24.405399 1 csi_handler.go:240] CSIHandler: finished processing "csi-6b6fa604013a2ea34f4bd414ad7ba60b1faa4bbccacce30b12cb2f5d49db3db2" │ │ csi-attacher I0830 08:35:24.405521 1 controller.go:210] Started VA processing "csi-6b6fa604013a2ea34f4bd414ad7ba60b1faa4bbccacce30b12cb2f5d49db3db2" │ │ csi-attacher I0830 08:35:24.405638 1 controller.go:217] VA "csi-6b6fa604013a2ea34f4bd414ad7ba60b1faa4bbccacce30b12cb2f5d49db3db2" deleted, ignoring │ │ linstor-csi time="2024-08-30T08:35:24Z" level=info msg="resource not temporary (not created by Attach) not deleting" linstorCSIComponent=client targetNode=k8sw2.gi

WanzenBug commented 2 weeks ago

Perhaps this is related to the HA Controller. Check the logs of the ha-controller containers to see if it tries to evict some pods.

It looks like DRBD thinks it is in a split-brain situation, indicated by the StandAlone connection. This is because you only have 2 nodes with no third "witness node". So whenever one node disconnects, both nodes will go into standalone mode.

To get out of this state, see https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-resolve-split-brain

And I recommend to have a third node to create a tie breaker resource. This makes splitbrains impossible.

S3LL1G28 commented 2 weeks ago

Hi @WanzenBug ,

Thank you for your quick answer. I have checked my HA controller log and effectively, it's continuoulst evict pods.

Before I have three nodes but my third node is "sick". I was thinking that two nodes will be ok.

OK so I will repair it. But a last question.

If I loose my third node and if I evict one of the twqo node, so finllay only one node and then my second node come back.

Is there something I can do by command line to force pvc mounting. This is in case I loose a machine for a day and continue having pvc working.

Again thank you for your support.

S3LL1G28 commented 2 weeks ago

I was thinking by recovering my third node everything will be ok. But Ha controller continue to evict.

kubectl linstor node list ╭────────────────────────────────────────────────────────────────────╮ ┊ Node ┊ NodeType ┊ Addresses ┊ State ┊ ╞════════════════════════════════════════════════════════════════════╡ ┊ k8sw1.gil-yan.net ┊ SATELLITE ┊ 10.244.3.188:3366 (PLAIN) ┊ Online ┊ ┊ k8sw2.gil-yan.net ┊ SATELLITE ┊ 10.244.4.23:3366 (PLAIN) ┊ Online ┊ ┊ k8sw3.gil-yan.net ┊ SATELLITE ┊ 10.244.5.160:3366 (PLAIN) ┊ Online ┊

Connection stay in Standalone and sometimes I saw a pvc in primary and then goes secondary.

S3LL1G28 commented 2 weeks ago

pvc-1f60b906-60b4-418d-bbc6-d55c8edbd585 role:Secondary
  disk:UpToDate
  k8sw1.gil-yan.net connection:StandAlone

pvc-430f8dc8-aca4-4b3e-8b5c-7a90bf281123 role:Secondary
  disk:UpToDate
  k8sw1.gil-yan.net connection:StandAlone

pvc-43ab85c9-ff82-4352-821a-2139bba8ab8c role:Secondary
  disk:UpToDate
  k8sw1.gil-yan.net connection:StandAlone

pvc-49a29ef4-eb2e-421c-866c-12201ed245e9 role:Secondary
  disk:UpToDate
  k8sw1.gil-yan.net connection:StandAlone

pvc-5ba1ee30-a79a-4e08-be5e-f832bf25b3a1 role:Secondary
  disk:UpToDate
  k8sw1.gil-yan.net connection:StandAlone

pvc-848c5e94-13cb-4ec8-aa5a-cb8789235c6c role:Secondary
  disk:UpToDate
  k8sw1.gil-yan.net connection:StandAlone

pvc-8df9269b-3378-42ec-a42d-753bba870b7d role:Primary
WanzenBug commented 2 weeks ago

Yes, because of the split brain I mentioned ealier, check https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-resolve-split-brain for how to get out of it.

Basically you need to decide which node has the most up-to-date data, and mark all other nodes as outdated

S3LL1G28 commented 2 weeks ago

Sorry I've missed your link. I was there and try search away as I don't succeed to execute commands.

May I ask you to show an example ?

if i want to do drbdadm disconnect pvc-1f60b906-60b4-418d-bbc6-d55c8edbd585.

it tells me no resources defined!

WanzenBug commented 2 weeks ago

if i want to do drbdadm disconnect pvc-1f60b906-60b4-418d-bbc6-d55c8edbd585.

You need to connect to the linstor-satellite.... pod and execute those commands there.

S3LL1G28 commented 2 weeks ago

Again sorry for my "noobery".

It's ok, I've got all my pvc running.

To summup for others persons. You have to determinate which node is the primary for the ressource . attribute the primary role and then go on other nodes and follow instructions from the url you give to me.

After that everything is ok

A big thank you.

S3LL1G28 commented 2 weeks ago

A last thing I don't realize. Having only 2 replications is a bad thing for three nodes.

so it's better to have 3 replications. But it will cost three times the disk space no ?