sshh12 / terrain-diffusion

MIT License
15 stars 2 forks source link

Question about this script training #3

Open Dchenlittle opened 7 months ago

Dchenlittle commented 7 months ago

Can you tell me if the training input for this training file consists of the original image, the mask image and the masked image? Also, if yes, is the mask image provided by the dataset labels or is it automatically generated during the training code? I request you to help me with this question

sshh12 commented 7 months ago

Hey! The masks are dynamically generated. The only inputs are the images/captions.

Dchenlittle commented 7 months ago

Thanks you very much for your answer. Can this training script be applied to other datasets mask can also be generated dynamically? If it is applied to other datasets, I need to provide the labelled mask or I need to modify the code for dynamic mask generation for the corresponding dataset, right?

Hey! The masks are dynamically generated. The only inputs are the images/captions.

sshh12 commented 7 months ago

You can specific the mask using --mask_mode (from defined in MASK_MODES) which defaults to the masks I typically use 512train-large. If you want to further customize the masks you'll need to modify generate_mask(x, mask_mode).

Dchenlittle commented 7 months ago

You can specific the mask using --mask_mode (from defined in MASK_MODES) which defaults to the masks I typically use 512train-large. If you want to further customize the masks you'll need to modify generate_mask(x, mask_mode).

Now if I want to train from scratch using your script and your dataset. I need to first download the dataset you linked to and then run the build_text2rgb_captions.py file to generate the captions data. Next, run the build_text2rgb_dataset.py file to produce the metadata.jsonl file. Finally, run the train_text... _inpaint.py file. Is this process correct? Is there anywhere I'm misunderstanding?

Dchenlittle commented 7 months ago

So far, I've downloaded the dataset in sentinel-2-rgb-captioned, the weights file stable-diffusion-2-inpainting and this training code too. I would like to ask what the train.... . inpaint.py file what are the input parameters to be filled in, can you tell me more about it?

sshh12 commented 7 months ago

In the file https://github.com/sshh12/terrain-diffusion/blob/main/scripts/train_text_to_image_lora_sd2_inpaint.py you can see "Example Usage"

Dchenlittle commented 7 months ago

In the file https://github.com/sshh12/terrain-diffusion/blob/main/scripts/train_text_to_image_lora_sd2_inpaint.py you can see "Example Usage"

Thanks you for your hlep. I have successfully run this train code. And i have a question about “Does the training process need to use the official v1-inpainting-inference. yaml file?”

sshh12 commented 7 months ago

No yamls needed. This is all though the diffusers library which does not use those configs.