[enhancement]: A few ideas to improve the user experience

VeyDlin commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Contact Details

No response

What should this feature add?

I have been working VERY hard with stable diffusion for the last month, I would like to share some things that I found while working

My biggest pain is switching between models, LoRA and hints, I have hundreds of them and I have prompts in notion for this with examples of images, trigger words, description, typical prompts. Finding the model that will best generate my idea and be compatible with LoRA turned out to take longer than the image generation itself, you need to find it in notion, find a hint for the style, find LoRA in the InvokeAI list (where there is no search! In the list with more than 300 LoRA!), find the old LoRA in the textbox and delete it (And then repeat if you want to return one of the LoRA that was previously in the textbox! ctrl+z does not work, because I have already changed something in the hint and I will get confused if I go back).

In addition, LoRA does not support spaces in the name! I also had to go back to more successful generations, but I had no idea in which model it was generated, it's not written anywhere, I had to use the same seed and generate images one by one until I found the right model.

I opened a lot of tabs with InvokeAI and selected images for reference in each when I wanted to keep this result in my head, and I also noticed that InvokeAI does not allow me to search during generation, I cannot select images and look at their prompts to decide what to do next while generating, which further distanced me from the creative process and dragged me into a routine

My suggestion is as follows: 1) Add a gallery with the favourites generated images, where you can see the models and tips used, as well as be able to add some of your own text to the description 2) Select models with the preview images 3) Select of LoRA should be made with the preview images, as well as its small pages, where you can see on which model it was generated, trigger words and the ability to write notes on how to use it (many Loras have an individual approach). It would be better if it would be possible to have a separate page for this, which can be opened in another tab, select the desired LoRA and send it as a hint to other pages, but even just a separate page with lora would be great 4) Below the textbox with prompts add a list of all the loras that are currently in use, where it will be possible to turn off or turn on one or another LoRA using the checkbox, as well as adjust their weight 5) Add a button to update the LoRA list or do it automatically 6) Add the "use model" button while viewing the image history, as well as a textbox with the name of the model used

Sometimes I rented a powerful server with A100 to test my hypotheses and generate a LOT of images, but I would not like my images to be stored on the server, at least because you have to pay extra for the memory on the server, a good function would be to be able to host the InvokeAI node remotely so that it only generates images and transmits them to my local PC

I have several friends of artists with weak video cards and I give them access to facilitate their work, but the fact that anyone can connect sucks, as well as the fact that everyone has one workspace and others see what the others generate, it would be nice to be able to select the list of models and LoRA for users and user groups, as well as so that everyone sees only their generated images, and also add a queue waiting function. You will also need to solve the problem with the fact that there cannot be too many models in memory, the change of models must also be added to the queue

I also used kohya_ss to create my LoRA and birme net for cropping images, preparing the dataset turned out to be quite laborious as well, I have a few ideas how this can be improved, since you are going to add a learning function in the future

1) When creating a new dataset, it would be convenient to have a choice at the very beginning with preset parameters, for example, for a character or style 2) After selecting a folder with images, you will have a list of all images where there is a textbox next to each one and you can quickly edit the description 3) Images can be cropped immediately in this list, just as it is done in birme net, but in birme net there is no way to reduce the proportional square, for example, so that only the face gets into it 4) Clone images, I used this when from one image you can get both the body of a character if you capture it completely, and his face if you zoom in and crop the image 5) Built-in image upscaler 6) Built-in BLIP/GIT/WD14 caption, if it is possible to generate a description in all at once and then choose the best one individually for each image - amazing

In the end, make a sufficiently powerful plugin system and community so that all of the above can be implemented using plugins. It is quite difficult to make a flexible plugin architecture and it is necessary to lay this opportunity at the first stages of application development, I have not looked deeply into the source code and am not sure if this opportunity has been lost or not yet. Enabling the community to develop the project using plugins is a strong advantage for the product to be popular, an example of this is visual studio code

I wrote about everything I found over the last month, I hope some of this will be useful. I also tried to remove from my list too specific cases that most users will most likely not encounter.

I'm sorry, it's difficult for me to write so much text in English, so I'll resort to a translator.

Alternatives

No response

Aditional Content

No response

hipsterusername commented 1 year ago

Hey VeyDlin -

Thank you for the feedback! We're hard at work building something that will solve the "plugin system" opportunity.

We also plan to offer training capabilities in the future.

Our team will be building a hosted offering that solves some of the other problems you called out - The OSS implementation of Invoke is intended to be the "single player" experience.

Once our 3.0 version comes out, we encourage you to stay involved as we build the solution, as we expect more people to be able to develop plugins and extensions then :)

VeyDlin commented 1 year ago

Oh, thank you!

A few more thoughts

A button that changes the width and height of the canvas would be really convenient, if you try different concepts you have to switch too often. Besides, it shouldn't be difficult to add it, right?

Many in the process of using create their "favorites" promts for quality, I'm talking about the text in which you write quality, lighting, and so on. They are different for art and photorealism. They are different in negative hints, for comics, characters, actions, interiors, surroundings

I want to say that adding aliases for prompts would be really convenient, they can work on the principle of simple replacement

For example, instead of:

Red car on the street

RAW photo, cinematic masterpiece, majestic epic composition, (movie scene), full shot, epic dynamic frame, (cinematic look)++, 24mm, 4k textures, masterpiece, best quality, official art, extremely detailed CG unity 8k wallpaper, ultra high res, professional photography, sharp focus, HDR, 8K resolution, intricate detail, sophisticated detail, depth of field, analogue RAW DSLR, photorealistic

I would write something like this:

Red car on the street @photorealism-1

Nate82 commented 1 year ago

Not sure if you mentioned these things but just a couple of suggestions off the top of my head while I'm working in Invoke.

A scrollbar on the side for images (middle finger gets sore having to mouse wheel through 100's of images)
Maybe the ability to create folders for images so you can drag and drop. There needs to be some way to organize images better. I make images for different projects and they just get all mixed up with other project images so it's really hard to stay organized.
An option to select multiple images to delete.

allendgithub commented 1 year ago

It'd be nice to have some kind of bell or alert sound when an image or series of images are finished processing, or some kind of alert so one doesn't have to keep checking to see if it's finished if they're doing something else on another tab. It can be optional if some people don't want one, but having it seems like a simple enough thing to add and would be a huge quality of life improvement.

dkhold commented 1 year ago

It'd be nice to have some kind of bell or alert sound when an image or series of images are finished processing, or some kind of alert so one doesn't have to keep checking to see if it's finished if they're doing something else on another tab. It can be optional if some people don't want one, but having it seems like a simple enough thing to add and would be a huge quality of life improvement.

That sounds exactly like what the notification API is meant for.

allendgithub commented 1 year ago

It'd be nice to have some kind of bell or alert sound when an image or series of images are finished processing, or some kind of alert so one doesn't have to keep checking to see if it's finished if they're doing something else on another tab. It can be optional if some people don't want one, but having it seems like a simple enough thing to add and would be a huge quality of life improvement.

That sounds exactly like what the notification API is meant for.

Or they could, you know, implement some kind of alert system into the program directly and automatically as I suggested.

dkhold commented 1 year ago

I don't understand your snark tone. What you want is literally the purpose of a native browser notification, to which I provided some documentation link. It is the obvious (and correct) way for webapps to notify of some event outside of their execution context (tab). It still needs to be implemented "in the program".

invoke-ai / InvokeAI