lllyasviel / style2paints

sketch + style = paints :art: (TOG2018/SIGGRAPH2018ASIA)
Apache License 2.0
18.01k stars 2.09k forks source link

Model Release Discussions #205

Closed lllyasviel closed 1 year ago

lllyasviel commented 1 year ago

This issue is for the discussions of the release of Style2Paints V5 large model.

Ihateyoudattebayo commented 1 year ago

So SEPA is an image diffusion model but V4 remaster will be a better version of V45?

marcussacana commented 1 year ago

What is the hardware minimal requeriments for run the IA? Currently SableDiffusion projects allow set lowvram that can run in 4GB VRAM GPUs even if slow, but what to expect from s2p v5?

ShinkoNet commented 1 year ago

Here's my opinion as an artist: I think that this large model should be released as a seperate program name to style2paints. The functionality is too different. I liked style2paints v4 as a pipeline in the colouring process of a finished lineart. The program did not alter the lineart in any way, and it allowed me to pick which specific colours I wanted in each area. With this diffusion model, although the outputs are impressive, there is way less control over it in general. I also cannot do the lighting for myself, since the colouring, shading and lighting steps of v4 have become a single uncontrollable step in v5. V5, considering the technology, seems to not be able to export a seperated colour layer to the lineart (which v4 can, and is very useful for art project files in image editing software). This large model does much better with loose sketch art, which gives it a different purpose more in line with brainstorming ideas, costumes and poses to use for an art piece, and works better placed in a different part of an art project pipeline than where I would place v4.

Anyway, I do agree that this model should be released for public use, possibly with the RAIL licenxe. It's power is on the same level of Stable Diffusion's img2img mode, which is already public.

The advantage this model seems to give, compared to img2img, is that we don't have the need of colouring in a sketch yourself to see shaded variations, and that you can control how closely the model should adhere to the sketch in a better way than img2img can.

But considering my above concerns, it should be renamed to a different software instead of an upgrade to an existing one.

Yumeo0 commented 1 year ago

I completely agree with @ShinkoNet on this one. Image generation and lineart painting are 2 different pairs of shoes. The lineart was the main focus of the software and suddenly calling an image generation ai an upgrade makes no sense to me.

lllyasviel commented 1 year ago

About the previous coloring mode, see also an updated section "Workflow-Oriented Assistant". Note that the images in this section is not finished and for preview only.

drbobo0 commented 1 year ago

thx lllyasviel and the S2P team for the update, i also agree with @ShinkoNet. lllyasviel we need more info on it please keep us updated.

lllyasviel commented 1 year ago

Update:

After receiving an incredible number of feedbacks, we have decided to make a few extra technical preview rounds before we finalize V5.

A ‘legendary’ coloring mode will be put in a higher priority. This mode will produce results that preserve all details of input line drawings and sketches, while at the same time using an ‘inner’ generated diffusion reference image when making automatic color compositions. Please see also the newly added section [Workflow-Oriented Assistant] for a preview of this pipeline. Note that the images in this section is not finished and for preview only.

We underestimated the demand of a better ‘plain’ coloring mode. Fortunately, we noticed that right now, thanks to the supports and feedbacks from all of you.

Note that the diffusion method can be used in many ways to also improve other components because it can generate automatic reference images for almost every pipeline.

Note that this will cause another delay on this project again. Nevertheless, and fortunately, we are much clearer than before about what to do and what should be done.

S2PR was and will always be a team focused on human-centric techniques, rather than AI-centric techniques.

Best, S2PR

lllyasviel commented 1 year ago

If you have more considerations other than the ‘plain’ coloring mode, please also let us know by mentioning here.

Ihateyoudattebayo commented 1 year ago

If you have more considerations other than the ‘plain’ coloring mode, please also let us know by mentioning here.

I know this would be hard to integrate, but a mode similar to PaintingLight as well which can add lighting and textures.

Ihateyoudattebayo commented 1 year ago

Update:

After receiving an incredible number of feedbacks, we have decided to make a few extra technical preview rounds before we finalize V5.

A ‘legendary’ coloring mode will be put in a higher priority. This mode will produce results that preserve all details of input line drawings and sketches, while at the same time using an ‘inner’ generated diffusion reference image when making automatic color compositions. Please see also the newly added section [Workflow-Oriented Assistant] for a preview of this pipeline. Note that the images in this section is not finished and for preview only.

We underestimated the demand of a better ‘plain’ coloring mode. Fortunately, we noticed that right now, thanks to the supports and feedbacks from all of you.

Note that the diffusion method can be used in many ways to also improve other components because it can generate automatic reference images for almost every pipeline.

Note that this will cause another delay on this project again. Nevertheless, and fortunately, we are much clearer than before about what to do and what should be done.

S2PR was and will always be a team focused on human-centric techniques, rather than AI-centric techniques.

Best, S2PR

I really apreciate the fact that S2PR is very open and uses the communities feedback. Also, what happened to SEPA UI?

lllyasviel commented 1 year ago

SEPA UI is also under heated disscussion of S2PR. Some people nowadays prefer Gradio more than a traditional desktop APP.

Ihateyoudattebayo commented 1 year ago

SEPA UI is also under heated disscussion of S2PR. Some people nowadays prefer Gradio more than a traditional desktop APP.

In my opinion, a desktop app feels a lot more high quality and studio level, than a gradio web link.

ShinkoNet commented 1 year ago

Thank you for listening to feedback. I look forward to the colouring mode improvements. I understand that the project hasn't shifted focus (looking at https://github.com/lllyasviel/style2paints/tree/master/V3#next-step ) in regard to improving images from rough sketches, so there are no problems with the diffusion model being added into the program. As long as the assisted colouring mode is still preserved.

Regarding the UI for SEPA, I do think that it should be kept to a standalone program instead of gradio. Gradio's library and functionality for image manipulation is quite limited. The web UI for hlky's stable diffusion fork had troubles with gradio when implementing img2img source photo manipulation, for instance. It would be the best if we had enough visual tools to do a sketch, layering it, then lineart, then colour/lighting layer, all inside the same program. And gradio won't be good for that.

Possibly, for v5, could we have a workflow where you input it a messy sketch, use the diffusion model to turn it into lineart (using lineart-related prompts), pick the best lineart, then feed that into the colouring mode? That would be pretty cool.

TheZoroark007 commented 1 year ago

Update:

After receiving an incredible number of feedbacks, we have decided to make a few extra technical preview rounds before we finalize V5.

A ‘legendary’ coloring mode will be put in a higher priority. This mode will produce results that preserve all details of input line drawings and sketches, while at the same time using an ‘inner’ generated diffusion reference image when making automatic color compositions. Please see also the newly added section [Workflow-Oriented Assistant] for a preview of this pipeline. Note that the images in this section is not finished and for preview only.

We underestimated the demand of a better ‘plain’ coloring mode. Fortunately, we noticed that right now, thanks to the supports and feedbacks from all of you.

Note that the diffusion method can be used in many ways to also improve other components because it can generate automatic reference images for almost every pipeline.

Note that this will cause another delay on this project again. Nevertheless, and fortunately, we are much clearer than before about what to do and what should be done.

S2PR was and will always be a team focused on human-centric techniques, rather than AI-centric techniques.

Best, S2PR

I would have a suggestion, but I'm not sure it really fits into SEPA: Would a mode that allows lineart to be automatically shaded in a manga/doujin style like your "MangaFilter" instead of color be possible ? (So things like cross hatching shadows etc)

Ihateyoudattebayo commented 1 year ago

Didnt they say that the idea was to use the diffusion generated img and feed it into the LEDGENDARY COLOURING MODE?

Ihateyoudattebayo commented 1 year ago

If you have more considerations other than the ‘plain’ coloring mode, please also let us know by mentioning here.

I think that a project as large as S2P NEEDS a discord in order to grow.

bropines commented 1 year ago

I did not see when this neural network model will be released. It is also possible for people who do not have a PC or a powerful PC to make a collab which will output the local server via bore.pub on the network. You can use https://github.com/carefree0910/carefree-creator as a webUI.

bropines commented 1 year ago

Also, why not use github discussions?

lllyasviel commented 1 year ago

Alice and Dorothy methods released. Future threads please comment based on the newer versions.

lllyasviel commented 1 year ago

I did not see when this neural network model will be released. It is also possible for people who do not have a PC or a powerful PC to make a collab which will output the local server via bore.pub on the network. You can use https://github.com/carefree0910/carefree-creator as a webUI.

This is a great project but it is runned by a company and will add to the commercial factors that we want to avoid.

p1atdev commented 1 year ago

(I use DeppL because I am not very good at English.)

Style2Paints V5 seems to be using Anything V3.0, which is not desirable since Anything V3.0 uses a model leaked from NovelAI. It is fine for those who are willing to use models that contain leaked ones, but it seems to me that it is better to avoid being a bad point for a good project like Style2Paints. Is it possible instead to use a model without problems such as WaifuDiffusion?

lllyasviel commented 1 year ago

(I use DeppL because I am not very good at English.)

Style2Paints V5 seems to be using Anything V3.0, which is not desirable since Anything V3.0 uses a model leaked from NovelAI. It is fine for those who are willing to use models that contain leaked ones, but it seems to me that it is better to avoid being a bad point for a good project like Style2Paints. Is it possible instead to use a model without problems such as WaifuDiffusion?

We cited Anything V3 because and only because we admire that model very much, not because that model is indispensible. Removing some dependency in model initialization is not difficult and seems to have nearly no influence on the final model. We do not even need WaifuDiffusion, alternatives can even be standard SD, since it is only training initilizations.

And note that the model architecture is not SD, though many layers weights are reused in initilizations.

ajundo commented 1 year ago

Great Job, thanks for sharing very much.

I think the model works in a very similar way as https://huggingface.co/stabilityai/stable-diffusion-2-depth, which takes an extra depth map as conditional input. The depth model is finetuned from original SD without complex model modification(just adding an extra channel).

[1] The extra parameters are mainly in the encoding layers of the input sketch.

I see you changed the model structure a lot to handle to sketch input. Could you briefly introduce the benefits?

lllyasviel commented 1 year ago

Great Job, thanks for sharing very much.

I think the model works in a very similar way as https://huggingface.co/stabilityai/stable-diffusion-2-depth, which takes an extra depth map as conditional input. The depth model is finetuned from original SD without complex model modification(just adding an extra channel).

[1] The extra parameters are mainly in the encoding layers of the input sketch.

I see you changed the model structure a lot to handle to sketch input. Could you briefly introduce the benefits?

sketch has many sharp and sparse lines that tends to destroy the SD model in a direct fine tuning. Although SD is powerful, it is still a U-net: the skip-connection of unets will simply overfit to make unwanted edges to minimize the learning loss. When those 64*64 layers are overfit, all 8*8, 16*16, 32*32 layers will not learn anything. We need extra encoders to handle that.

Note that depth maps are dense-featured images and do not have this problem, and this is perhaps why stability use depth rather than image edges.

If you directly finetune a SD model to learn edges, you will get something similar to this:

image

Note that if you use depth maps instead, you will probably get this:

image

We can see that depth is much easier to handle than edges.

it is also in https://arxiv.org/abs/2012.09841

Also we can clearly see that the Alice model can recognize body parts from badly draw scribbles. This is also not possible without extra architecture.

Note that this is only my personal understanding. I am not sure whether or not I am correct.

Eric07110904 commented 1 year ago

Awesome works!!! How much VRAM do I need for training or fine-tuning this model? I want to know the minimal requeriment of GPU.

ajundo commented 1 year ago

I see, you use a special encoder to encode the sketch because sketch has complicated semantic information. The information is not needed to be encode for a depth map feature. So if I understand correctly, the way of feed conditional sketch is similar to https://arxiv.org/pdf/1903.07291.pdf? Thanks for your explanation.

lllyasviel commented 1 year ago

I see, you use a special encoder to encode the sketch because sketch has complicated semantic information. The information is not needed to be encode for a depth map feature. So if I understand correctly, the way of feed conditional sketch is similar to https://arxiv.org/pdf/1903.07291.pdf? Thanks for your explanation.

We will release tech report soon. Just stay tuned!

lllyasviel commented 1 year ago

Update:

We removed Anything V3 from README. This is not because S2PR has taken any action. This is because some related people from Anything V3 contacted us and they do not want to claim the authority of that model since A3 is also a very wild mixture of lots of models and dreambooths.

Ihateyoudattebayo commented 1 year ago

Update:

We removed Anything V3 from README. This is not because S2PR has taken any action. This is because some related people from Anything V3 contacted us and they do not want to claim the authority of that model since A3 is also a very wild mixture of lots of models and dreambooths.

When is the colour scrible section going to be made?

SwayStar123 commented 1 year ago

This is amazing.

Will there be model distillation done to reduce inference times?

Is there any work being done on lineart -> lineart? Lineart would be easier to edit and would prove to be a good middle step before going straight to colorization (for the alice method, lineart -> lineart would provide greater control whilst still requiring much less detailed original input)

Ihateyoudattebayo commented 1 year ago

https://mobile.twitter.com/lvminzhang/status/1392143022221975554 @lllyasviel What happened to this version of SEPA? Also, what is the node system for?

My12123 commented 1 year ago

Will the program be available in one language? I would like Style2paints5 to be available in many languages (International languages These include: Englishman Spanish Arabian Russian French Chinese). I am interested in Russian. If yes, what?

AltoCrat commented 1 year ago

Hi, could I enquire about the acquisition of the dataset? "The training data comes from two domain: 50% are Gwern’s Danbooru dataset (washed by many metrics), and 50% are research materials from Style2Paints Research (collected in 7 years from 2016 to 2023)." I understand that the first dataset from Danbooru comes from Gwern, who appears to have permission to capture the data from the site itself for ML purposes. But I am curious about how the dataset for the research materials from Style2Paints Research was obtained, as I am interested in using the program but would like to avoid using data that could have been obtained without permission. On another note, how possible would it be to train the model using my own drawings as data, on that end, would one really need as much as 1-7M pieces of unique data to train it, or could it be possible to train it with less? And what are the hardware limitations if so? Any information is appreciated, as I am not very confident in my knowledge of AI models.

lllyasviel commented 1 year ago

Because the previous thread is out-of-date and many new previews are released, new thread starts here.