provider-terraform thrashes disk computing checksums with plugin cache disabled

What happened?

I found a nice footgun, and proceeded to use it :)

We have the plugin cache disabled, we're using both the google and google-beta providers, and we managed to cause a provider version upgrade. Initially, we didn't have -upgrade=true in our initArgs, so the provider was repeatedly failing to reconcile. Once we added that in, provider-terraform was able to upgrade the providers and reconcile the workspaces.

At this point, provider-terraform started running very slowly, to the point that it was consistently at its --max-reconcile-rate limit, and workspaces were appearing back on the queue after the poll interval before the queue emptied.

During this time, execing into the pod and running top and iostat showed low CPU/RAM usage, but a very large number of Disk operations. This was reflected in the GCP monitoring for the GKE node the pod was running on.

My hypothesis is that computing the md5sums over the TF provider binaries caused too much disk usage, leading to throttling. After restarting the pod (which threw away the old binaries), provider-terraform was able to clear the queue.

I know there are a couple of options for mitigation that we can try to do:

Fix the workspaces and providerconfig so that they only install one provider instead of two
Try to re-enable the plugin cache (we disabled it because it didn't seem to play nicely with concurrent long-running apply operations)
Switch our GKE nodes to use SSD instead of HDD to get better disk performance

However, I think there a couple of possible improvements to provider-terraform:

Exclude providers when computing the workspace checksum (aiui, it'd be excluded if using the plugin cache?). If this is a reasonable course of action, I'm happy to contribute it.
Allow configuring provider-terraform with a PersistentVolumeClaim (which could be SSD without making all of the nodes be SSD). This would in theory allow the workspaces to be persistent, and not need to be re-initialised after provider-terraform restarts, which would probably be beneficial.

How can we reproduce it?

Roughly 90 workspaces provisioning DNS records (all using the google provider)
Roughly 10 workspaces provisioning GKE clusters (using the google-beta provider for the cluster, and google for the node pools)
ProviderConfig references both the google and google-beta provider (which causes both providers to be installed in every workspace)
No initArgs: ['upgrade=true'] in workspaces
Set a version constraint for an old version as below, and get all of the workspaces to show ready
Update the provider version, let everything start failing to reconcile
Add initArgs: ['upgrade=true'] in workspaces
Watch everything grind to a halt

What environment did it happen in?

Crossplane Version: v1.12.2
Provider Version: v0.10.0
Kubernetes Version: v1.24.14-gke.2700
Kubernetes Distribution: GKE

upbound / provider-terraform