invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
22.38k stars 2.32k forks source link

[bug]: for a diffusers model selecting sampler `k_dpmpp_2_a` in has no effect #2463

Closed damian0815 closed 1 year ago

damian0815 commented 1 year ago

looks like diffuser doesn't support it under the hood - needs to be disabled, perhaps?

in any case a failed sampler selection shouldn't leave the web ui in a state where it shows that i've selected k_dpmppp_2_a but in fact PNDMS is running instead, at least according to the log message:

Unsupported Sampler: k_dpmpp_2_a Defaulting to PNDMScheduler
mickr777 commented 1 year ago

Yeah It looks like it defaults back to the last supported sampler that was used before switching too K_dpmpp_2_a, but doesnt update webui

lstein commented 1 year ago

@psychedelicious @keturn What's the short-term solution for this? The quick fix would be to take k_dpmpp_2_a off the list of supported schedulers. The main disadvantage is that this is disabling the scheduler for ckpt models as well as diffusers. Another fix would be for the webgui to hide/disable the menu item when a diffusers model is selected, but I think this would require making changes to the communication between the backend and the web to indicate that some schedulers only work with ckpt models.

@keturn What's the prospect of this scheduler getting added to the upstream diffusers package?

keturn commented 1 year ago

I'm sure diffusers will accept a PR for it, it's just a question of who. Do we have an InvokeAI contributor who could take a shot at it?

Otherwise we can ask if @LuChengTHU is interested in adding ancestral sampling to the code contributed to diffusers, or if @patrickvonplaten's team can make time for it.

Personally I'm still wrapping my head around how ancestral samplers work, and in particular what it means in relation to a non-stochastic sampler like DPM Solver++. Fortunately I was able to get a bit of clarification from Katherine:

k-diffusion is responsible for popularizing the term for diffusion sampling but i got it from Yang Song's paper https://arxiv.org/abs/2011.13456 [Score-Based Generative Modeling through Stochastic Differential Equations, ICLR 2021]

Euler Ancestral is the algorithm from Appendix F, "ANCESTRAL SAMPLING FOR SMLD MODELS", equation 47.

and the other samplers i call "ancestral" are ones where I have substituted in a higher order ODE step for the (sigma_i^2 - sigma_(i-1)^2 s_theta(x_i, i) term of equation 47, leaving the noise addition term alone.

And regarding sample_dpmpp_2s_ancestral specifically:

yes, if you set eta=0 you recover the original DPM++ 2S.

if you set eta=1 you get a sampler where the score term of eq 47 is replaced with a DPM++ 2S step, which is my own idea.

[transcript lightly edited for punctuation]

psychedelicious commented 1 year ago

I agree more communication between backend and client is needed.

I think a request_capabilites method that returns exactly what is supported in a given configuration is needed (it's something I've brought up before but we just haven't gotten to it really).

So for example, depending on the models currently active, whether xformers is active, whether GFPGAN is active, and so on, Generate.request_capabilites() sends an object describing the whole system's capabilities that moment.

Needs to be able to be called arbitrarily as the user changes models etc, and include data like which samplers are supported at that moment.

lstein commented 1 year ago

@psychedelicious Let me know what info you want returned by request_capabilities() and I'll happily implement it. I think you might have documented this in the past, but I can't find it just now.

psychedelicious commented 1 year ago

@lstein Sure, here is typescript type describing what I am thinking (I hope the typescript syntax is understandable).

type Postprocessor = {
  id: string;
  name: string; // e.g. GFPGAN, ESRGAN, CodeFormer etc
  version: string;
  path: string; // the root dir for the postprocessor
  category: 'faceRestoration' | 'upscale'; // extend as needed
};

type Model = {
  id: string;
  name: string; // human-readable
  version: string; // unsure if this is needed
  hash: string;
  path: string;
  description: string;
  type: 'ckpt' | 'diffusers'; // 'safetensors'? dunno what else
  vaePath: string;
  isDefault: boolean; // eventually this can be handled elsewhere
  isMergeOf: Model[];
};

type Sampler = {
  id: string; // the standard abbreviations we've been using
  name: string; // human-readable
  aliases: string[]; // noticed some samplers kinda have different names...
};

type Embedding = {}; // dunno what this should look like

type InfillMethod = {
  id: string;
  name: string; // human-readable
};

type SystemActivity = {
  isProcessing: boolean;
  currentActivity: string; // e.g. 'Generating', 'Preparing', 'Saving Image' etc
  currentStep: number; // sampling step or any other step, -1 if current activity is indeterminate without steps
  totalSteps: number;
  phaseName: string; // e.g. 'Infill', 'Inpaint', 'Seam Correction' etc
  currentPhase: number;
  totalPhases: number;
  vramUsage: number;
  ramUsage: number;
};

type systemStatus = {
  appVersion: string;
  startTime: string; // when the app started running - a parseable datetime
  currentModelId: string;
  availableModels: Record<string, Model>; // dict of models, keyed by id
  currentEmbeddingId: string; // or can we have multiples active?
  availableEmbeddings: Record<string, Embedding>; // dict of embeddings, keyed by id
  availablePostprocessors: Record<string, Postprocessor>; // dict of postprocessors, keyed by id
  systemActivity: SystemActivity;
  availableSamplers: Record<string, Sampler>; // depends on current model selection, maybe OS, etc
  availableInfillMethods: Record<string, InfillMethod>;
  noiseDevice: 'cpu' | 'cuda'; // maybe there are others? and 'noiseDevice' probably isnt' the right term :P
};
github-actions[bot] commented 1 year ago

There has been no activity in this issue for 14 days. If this issue is still being experienced, please reply with an updated confirmation that the issue is still being experienced with the latest release.