[TEST] Test long-running Longhorn installation

What's the test to develop? Please describe

There are issues that can only appear after running Longhorn for a long time (for example high CPU/RAM usage due to connection leak OR too many failed backups) . @ejweber and I think that we could catch these problems earlier before the release if there is a long-running Longhorn setup.

Test setup:

Create a long-live cluster
Deploy Longhorn master-head
Deploy workload (with liveness probe to detect when there is issue with Longhorn volume). The workload should also have some IO load
Deploy Rancher monitoring to track events like high CPU, RAM usage of Longhorn pods
Also monitor Longorn metrics to detect the number of backup errors https://longhorn.io/docs/1.5.1/monitoring/metrics/
We can add more topics to monitor if needed. Any idea? @longhorn/dev?

Upgrade strategy

Perform regular Kubernetes version upgrades (maybe once per month? )
Perform OS upgrade when needed
Perform Longhorn upgrade:
1. Frequency: maybe update Longhorn master-head to have newer fixes 1 once per week?
2. Engine: enable auto-engine upgrade
3. Instance manager: Scale up/down the workload to move engine/replica processes to a newer instance-manager version. This will remove the old instance manager version

longhorn / longhorn

[TEST] Test long-running Longhorn installation #6367

What's the test to develop? Please describe

Test setup:

Upgrade strategy