Make the metrics feature non-optional (there's a lot of code coming that exposes metrics and it'll get painful/pointless to maintain the ability to opt out of compiling metrics support at the cargo feature level
Add a special MemoryPool that wraps the default DataFusion MemoryPool (that setting defaults to GreedyMemoryPool, so construct that explicitly) and logs the total allocated/released bytes by each DataFusion query tree node.
Also replace all uses of RwLock<HashMap> to DashMap as @gruuya proposed.
Sample output:
# HELP seafowl_datafusion_memory_pool_reserved_bytes_current Current memory reserved in DataFusion's managed memory pool
# TYPE seafowl_datafusion_memory_pool_reserved_bytes_current gauge
seafowl_datafusion_memory_pool_reserved_bytes_current 0
# HELP seafowl_datafusion_memory_pool_freed_bytes_total Memory freed in DataFusion's managed memory pool
# TYPE seafowl_datafusion_memory_pool_freed_bytes_total counter
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="RepartitionExec"} 2033677121
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="NestedLoopJoinLoad"} 15072
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="TopK"} 5951438
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="GroupedHashAggregateStream"} 89032440
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="CrossJoinExec"} 21184
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="HashJoinStream"} 18346
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="AggregateStream"} 3230
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="SortPreservingMergeExec"} 2932891
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="ExternalSorter"} 3268145
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="HashJoinInput"} 112437164
seafowl_datafusion_memory_pool_freed_bytes_total{consumer="ExternalSorterMerge"} 120259084288
# HELP seafowl_datafusion_memory_pool_allocated_bytes_total Memory allocated in DataFusion's managed memory pool
# TYPE seafowl_datafusion_memory_pool_allocated_bytes_total counter
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="TopK",result="success"} 5951438
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="SortPreservingMergeExec",result="success"} 2932891
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="ExternalSorter",result="success"} 3268145
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="HashJoinStream",result="success"} 18346
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="ExternalSorterMerge",result="success"} 120259084288
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="HashJoinInput",result="success"} 112437164
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="NestedLoopJoinLoad",result="success"} 15072
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="AggregateStream",result="success"} 3230
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="GroupedHashAggregateStream",result="success"} 89032440
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="RepartitionExec",result="success"} 2033677121
seafowl_datafusion_memory_pool_allocated_bytes_total{consumer="CrossJoinExec",result="success"} 21184
Make the metrics feature non-optional (there's a lot of code coming that exposes metrics and it'll get painful/pointless to maintain the ability to opt out of compiling metrics support at the cargo feature level
Add a special
MemoryPool
that wraps the default DataFusionMemoryPool
(that setting defaults toGreedyMemoryPool
, so construct that explicitly) and logs the total allocated/released bytes by each DataFusion query tree node.Also replace all uses of
RwLock<HashMap>
toDashMap
as @gruuya proposed.Sample output: