hltcoe / sandle

Run a large language modeling SANDbox in your Local Environment
Other
7 stars 1 forks source link

Out of memory in some settings even when there should be plenty #85

Open ccmaymay opened 1 year ago

ccmaymay commented 1 year ago

Looked like code was trying to put things on cuda:1 at inference time even though it was virtually maxed out (and cuda:0 empty) after loading the model. Maybe a little extra space was needed (and unavailable) on cuda:1 even though most inference-time data was going on cuda:0?

https://ccmaymay.sentry.io/issues/3989060942/?project=6619116&query=is%3Aunresolved&referrer=issue-stream