turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

ExLlamaV2Embedding can't be unloaded if it failed to load #316

Closed bjj closed 2 weeks ago

bjj commented 5 months ago
    def unload(self):

        del self.embedding
        self.embedding = None

This means model also can't be unloaded if it (partially) failed to load due to running out of memory

turboderp commented 5 months ago

I'm a little confused. Would unload() raise an exception?

bjj commented 5 months ago

Turns out del self.embedding is an error if the attribute doesn't exist. It's not even necessary according to my understanding: Setting to None is just as good.

(that attribute doesn't have an initializer, it just has a type)

turboderp commented 5 months ago

Oh yeah, I misread it. It was supposed to be None. Fixed now.

You're probably right that del is redundant, though people seem to disagree on what it does exactly. I'll leave it as and investigate later.