What does this PR do?

Somestime OOM occures during warumup when using quantized models Trying to patch it using larger dtype for calculating free blocks, so more free vRam is available

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Was this discussed/approved via a Github issue or the discord / slack channel? Please add a link to it if that's the case.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

predibase / lorax

Fix quant cache OOM #494

What does this PR do?

Before submitting

Who can review?