rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.01k stars 2.65k forks source link

MDS behind on trimming every 4-5 weeks causing issue for ceph filesystem #14220

Open akash123-eng opened 2 weeks ago

akash123-eng commented 2 weeks ago

Hi,

We are using rook-ceph with operator 1.10.8 and ceph 17.2.5. we are using ceph filesystem with 4 mds i.e 2 active & 2 standby MDS every 3-4 weeks filesystem is having issue i.e in ceph status we can see below warnings warnings :

2 MDS reports slow requests 
2 MDS Behind on Trimming
mds.myfs-a(mds.1) : behind on trimming (6378/128) max_segments:128, num_segments: 6378
mds.myfs-c(mds.1):  behind on trimming (6560/128) max_segments:128, num_segments: 6560

to fix it, we have to restart all MDS pods one by one. this is happening every 4-5 weeks.

We have seen many ceph issues related to it on ceph tracker and many people are suggesting to increase mds_cache_memory_limit currently for our cluster mds_cache_memory_limit is set to default 4GB mds_log_max_segments is set to default 128 should we increase mds_cache_memory_limit to 8GB from default 4GB or is there any solution to fix this issue permenantly?

Environment: Kubernetes

akash123-eng commented 2 weeks ago

@Rakshith-R @Madhu-1 can you please help on above ?

Rakshith-R commented 2 weeks ago

https://ceph.io/en/community/connect/

I or madhu are not familiar with such core ceph problems.

You should reach out on ceph slack or their mailing list for core ceph issues. https://ceph.io/en/community/connect/