rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.4k stars 2.69k forks source link

xfs still dangerous due to deadlock? #14562

Open KlavsKlavsen opened 3 months ago

KlavsKlavsen commented 3 months ago

We are having an issue with hitting inode limits on small PVs that create a lot of small temporary files :(

We wanted to switch to XFS - but then saw this: https://github.com/rook/rook/blob/fd0c687d6ebb01afc12bc61fc70f03241953ae72/deploy/examples/csi/rbd/storageclass.yaml#L83

Is this still the case.. it would be nice if this linked to the relevant issue - so we could see latest status and possible work arounds (and better understand the issue).. can anyone share any link to an issue explaining this problem? I'll gladly submit PR for improving the example here obviously - to help others who hit this in the future

travisn commented 3 months ago

For background on the issue with xfs, see #3132. The ceph tracker linked at the end of that issue does not appear to be resolved.

KlavsKlavsen commented 3 months ago

@travisn issue says it should be resolved on kernel side with 5.8 release.. after which a fix in ceph - which has not happened yet to use the new kernel feature will resolve it. https://tracker.ceph.com/issues/43910

satoru-takeuchi commented 3 months ago

There was a discussion about the same topic in the rook slack. Raphaël Ducom asked the progress of this issue in ceph-users.

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/XJT4CBZSNFRDA2ZEVJCV4W2NAK3LPZF5/

donkeyDau commented 1 month ago

We're also still using ext4 instead of XFS due to this. We're deploying a MongoDB in a hyperconverged scenario (at least I interpreted this way: the MongoDB pods running on the same node as the OSDs).

If we move the pods to another node: would this enable us to use XFS (as MongoDB recommends)? Or are we still in "danger" to run into this problem then?

travisn commented 1 month ago

We're also still using ext4 instead of XFS due to this. We're deploying a MongoDB in a hyperconverged scenario (at least I interpreted this way: the MongoDB pods running on the same node as the OSDs).

If we move the pods to another node: would this enable us to use XFS (as MongoDB recommends)? Or are we still in "danger" to run into this problem then?

Correct, until this issue is fixed, if your app pods (mongodb) are running on different nodes from the OSDs, you could safely use XFS.