Open maj160 opened 2 months ago
Here's a quick example of the kind of snippet that would fix this (stable_diffusion.cpp:654)
for (auto& kv : curr_lora_state) {
const std::string& lora_name = kv.first;
float curr_multiplier = kv.second;
// If the lora is no longer in the prompt
if (lora_state.find(lora_name) == lora_state.end()) {
float multiplier_diff = -curr_multiplier;
if (multiplier_diff != 0.f) {
lora_state_diff[lora_name] = multiplier_diff;
}
}
}
There is probably a nicer way to do this, though
I worry about repeated addition and removal of LoRAs in that method introducing a kind of compounding precision error, although perhaps I am overly cautious. I suppose for those desperate to never have to reload the base model file one could store an unaltered set of tensors from the base model and then apply LoRAs to a copy of those tensors whenever the LoRAs being used, or their respective application strengths, change. Naturally this would double the amount of space the base model takes up while the program is running.
Thanks - I agree that even under the existing apply_loras mechanism there is potential for precision errors - But that's a much harder problem and not really the focus of this bug report.
This report is about buggy behaviour in the current implementation with a (hopefully!) simple fix, I'd rather it didn't get bogged down and lost in discussion about how LoRAs are handled in general... That feels like a topic for another issue.
The issue is that the two are related. This isn't an issue for the CLI program because neither the model nor LoRAs change from batch image to batch image and even if there are precision errors in the LoRA tensor loading they are reproducible.
However, for people who want to run sdcpp as a server its important to ensure that any solution does not lose precision over time. If it does one could have a situation where the same prompt, with the same settings, with the same seed produces a different result just due to precision decay if many LoRAs have been loaded and unloaded in the intervening time.
The simplest solution would be if when running in this kind of server mode if any of the LoRA settings were to change to just have it reload the model either from a file or from a stored copy of the base model tensors, otherwise it is difficult to guarantee reliable and reproducible behavior.
reload base model is too slow
This only matters when the same sd_ctx is used for multiple prompts - Loras that have been applied in a previous prompt but don't appear in the current prompt are not unapplied.
Steps to reproduce:
Why this happens: curr_lora_state contains the loras that are currently applied and their weights. However, inside apply_loras, we only handle loras in the current prompt, and we don't consider any other loras that are in curr_lora_state and might need to be removed.
Potential fix: In apply_loras(), loop over curr_lora_state and unapply any loras that do not appear in the lora_state (that is, add them to lora_state_diff with negative multiplier)