onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Apache License 2.0
770 stars 321 forks source link

Remove a spike of memory usage in ScrubDisposablePass. #2978

Closed imaihal closed 1 month ago

imaihal commented 1 month ago

This PR removes a spike in memory usage. When checking memory usage in a LLM (Mistral-7b), ScrubDisposablePass consumes a lot of memory only for a moment(spike about 50GB). This is likely to due to the fact that having both denseElementsAttr and disposalElementsAttr are hold during conversion. In Mistral-7b model, the spike is about 23GB, and total memory usage including spike is about 50GB. This PR removes the spike and total memory usage reduces to 27GB.

chentong319 commented 1 month ago

What the relationship of this PR and #PR2917? Should this PR go first? Or for onnx-mlir passes, should we always use disposableElementsAttr to reclaim the space when constants are translated into krnl.Global or LLVM constant?

imaihal commented 1 month ago

What the relationship of this PR and #PR2917? Should this PR go first? Or for onnx-mlir passes, should we always use disposableElementsAttr to reclaim the space when constants are translated into krnl.Global or LLVM constant?

This is different issue and no relationship with PR2917. Currently when converting from disposalElementsAttr to denseElementsAttr in this pass, disposalElementsAttr is kept even after conversion.

imaihal commented 1 month ago

@sorenlassen I think you wrote the comments about lock in this code. The comments is a bit old, but do you remember something about the lock.

imaihal commented 1 month ago

I updated to use the lock when deleting disposalElementsAttr. They are deleted for each batch since I'm concerned the performance degradation by frequent lock. Could you review again?

jenkins-droid commented 1 month ago

Jenkins Linux amd64 Build #15901 [push] Remove a spike of memory... started at 09:26

jenkins-droid commented 1 month ago

Jenkins Linux s390x Build #15904 [push] Remove a spike of memory... started at 10:26

jenkins-droid commented 1 month ago

Jenkins Linux ppc64le Build #14931 [push] Remove a spike of memory... started at 10:39

jenkins-droid commented 1 month ago

Jenkins Linux s390x Build #15904 [push] Remove a spike of memory... passed after 2 hr 3 min

jenkins-droid commented 1 month ago

Jenkins Linux amd64 Build #15901 [push] Remove a spike of memory... passed after 2 hr 14 min

jenkins-droid commented 1 month ago

Jenkins Linux ppc64le Build #14931 [push] Remove a spike of memory... passed after 3 hr 30 min