Changing hyper-parameters after initilization without reloading weights from disk.

I'm writing a production server to handle requests from a large number of clients rotating. I have a custom manager class that handles everything, but I'm hoping to keep the models persistent in memory between requests. I'm trying to build so requests can specify hyper-parameters such as max_seq_len, temperature, etc. I'd prefer to do this as efficiently as possible and swap out custom parameters for each client request, as opposed to fully reloading the model from disk on every call with unique parameters.

Is there a way I can do this with the current code? If not, what would I need to refactor? I am working with jllllll's python package fork of this repo, but the changes are minimal, so I figured it appropriate to ask this question here.

turboderp / exllama

Changing hyper-parameters after initilization without reloading weights from disk. #299