Closed isaac-vidas closed 8 months ago
This PR helps to fix a potential OOM issue when searching awq quantization scales for LLaVA family models #144 . Many thanks to @isaac-vidas and @casper-hansen !
Could you please review this PR and merge it to the main branch? Thanks @kentang-mit , @tonylins , @Sakits .
I don't have permissions to merge this PR. @ys-2020 I think you can probably do it 😄
Following up on #144.
@casper-hansen's suggestion worked so added
use_cache=False
when the model is created. Added this to theentry.py
code when loading LLaVA model to avoid the memory issue described in the issue.After adding this change in my environment and running the command again, it worked without any issues: