Open klahnakoski opened 5 years ago
I suspect that the effective network is only throttled when the physical network is saturated. To maximize value, we want a statistical measure of effective network capacity, and find the machines that are best. We also want to use is_throttled
state to help decide if shard movement should continue.
The first step is detecting if/when the network is throttled This may require a low-intensity network ping, between multiple nodes, to determine if throttling is happening. We may measure throttling by tracking shard transfer times.
My latest theory is the network baseline transfer limits on the spot nodes is too low: Ingestion and queries slow to a crawl while file transfers consume all available network. The SpotManager should consider minimum network speed when bidding on nodes.
Bigger nodes means bigger drives, which means more network usage to fill those drives. Bigger nodes may not be a solution