Closed imaihal closed 1 month ago
What the relationship of this PR and #PR2917? Should this PR go first? Or for onnx-mlir passes, should we always use disposableElementsAttr to reclaim the space when constants are translated into krnl.Global or LLVM constant?
What the relationship of this PR and #PR2917? Should this PR go first? Or for onnx-mlir passes, should we always use disposableElementsAttr to reclaim the space when constants are translated into krnl.Global or LLVM constant?
This is different issue and no relationship with PR2917. Currently when converting from disposalElementsAttr to denseElementsAttr in this pass, disposalElementsAttr is kept even after conversion.
@sorenlassen I think you wrote the comments about lock in this code. The comments is a bit old, but do you remember something about the lock.
I updated to use the lock when deleting disposalElementsAttr. They are deleted for each batch since I'm concerned the performance degradation by frequent lock. Could you review again?
Jenkins Linux amd64 Build #15901 [push] Remove a spike of memory... started at 09:26
Jenkins Linux s390x Build #15904 [push] Remove a spike of memory... started at 10:26
Jenkins Linux ppc64le Build #14931 [push] Remove a spike of memory... started at 10:39
Jenkins Linux s390x Build #15904 [push] Remove a spike of memory... passed after 2 hr 3 min
Jenkins Linux amd64 Build #15901 [push] Remove a spike of memory... passed after 2 hr 14 min
Jenkins Linux ppc64le Build #14931 [push] Remove a spike of memory... passed after 3 hr 30 min
This PR removes a spike in memory usage. When checking memory usage in a LLM (Mistral-7b), ScrubDisposablePass consumes a lot of memory only for a moment(spike about 50GB). This is likely to due to the fact that having both denseElementsAttr and disposalElementsAttr are hold during conversion. In Mistral-7b model, the spike is about 23GB, and total memory usage including spike is about 50GB. This PR removes the spike and total memory usage reduces to 27GB.