kiwix / operations

Kiwix Kubernetes Cluster
http://charts.k8s.kiwix.org/
7 stars 0 forks source link

Do we have an IO issue on storage node? #170

Closed benoit74 closed 1 month ago

benoit74 commented 8 months ago

Global overview of the situation, only some food for thought for now

image

image

dev-library consumes about 100 to 180 IOPS (Read+Write) and 10 to 25 MB/s (Read+Write)

image

dev-library-generator is quite fast (2-3 mins) but consumes even more.

image

As a comparison, each library-data (prod serving ZIMs) consumes 3-4 IOPS in average (Read+Write, there is some peaks at 10 to 30) and a MB/s (Read+Write, there is some peaks at 4)

image

But rsyncd is even more intensive

image

image

One idea from @rgaudin: should we move prod library (most time sensitive application on this server) to a new server, with prod ZIMs mirrored from storage, where the service could be more quiet? (and only need about 4G of ZIMs, no need for the double copy, no need for dev ZIMs, nightlies, ...)

kelson42 commented 8 months ago

@benoit74 You mean 4TB of ZIM I guess for the prod library?

rgaudin commented 8 months ago

Current prod library is 4.23TiB

kelson42 commented 8 months ago

One idea from @rgaudin: should we move prod library (most time sensitive application on this server) to a new server, with prod ZIMs mirrored from storage, where the service could be more quiet?

If there is no obvious technical optimisation in view, this looks like the logical approach. But we should have more buffer and probably count with around 8TB in at least Raid5. How much would that cost?

benoit74 commented 8 months ago

If there is no obvious technical optimisation in view, this looks like the logical approach.

As I said, this is only food for thought for now. Having thought a bit (I just had a shower 🤣) I think we have other tracks to follow:

But we should have more buffer and probably count with around 8TB in at least Raid5. How much would that cost?

benoit74 commented 1 month ago

Duplicate of https://github.com/kiwix/operations/issues/227, solved by moving workload to another cloud provider

rgaudin commented 1 month ago

Ah I wanted to comment that we haven't enabled SSD but there's #246 just for that :)