Out of memory in some settings even when there should be plenty

4 x 24GB Titan RTX
CUDA_VISIBLE_DEVICES=2,3
facebook/opt-13b
balanced_low_0
prompt: Hello,

Looked like code was trying to put things on cuda:1 at inference time even though it was virtually maxed out (and cuda:0 empty) after loading the model. Maybe a little extra space was needed (and unavailable) on cuda:1 even though most inference-time data was going on cuda:0?

https://ccmaymay.sentry.io/issues/3989060942/?project=6619116&query=is%3Aunresolved&referrer=issue-stream

hltcoe / sandle

Out of memory in some settings even when there should be plenty #85