GPU memory - Githubissues

LeeHaoRanRan commented 2 weeks ago

Hello, I would like to know how big each image is in your dataset because I want to use it in the medical dataset

hyao1 commented 2 weeks ago

Hello, we use 512x512 image in experiment. Even for 256x256 input images, we also upsample the input images into 512x512 to be consistent with pretained Stable Diffusion.

If your GPU memory is limited, please reduce batch size into 16, the generated images also are satisfying.

In addition， you can attempt to implement training model with mixed precision, which will further reduce required memory. I will attempt to implement recently.

LeeHaoRanRan commented 2 weeks ago

Thank you very much for your answer

LeeHaoRanRan commented 2 weeks ago

Hello, I'm still wondering, how is the pre-trained model used, is it used directly in the provided GitHub code? Do I need to retrain the pretrained model if I use a dataset that is not in the paper? Or use it directly

hyao1 commented 2 weeks ago

Hello, I'm still wondering, how is the pre-trained model used, is it used directly in the provided GitHub code? Do I need to retrain the pretrained model if I use a dataset that is not in the paper? Or use it directly

You should retrain the pretrained UNet of Stable diffusion for your datasets（just modify the datapath and hyperparameter. The fine-turning VAE and DINO is not necessary, which depends on the difference between the pre-trained dataset and yours. You can determine whether to fine-tune VAE by comparing the difference between before and after processing images with VAE

LeeHaoRanRan commented 2 weeks ago

Hello, I'm still wondering, how is the pre-trained model used, is it used directly in the provided GitHub code? Do I need to retrain the pretrained model if I use a dataset that is not in the paper? Or use it directly

You should retrain the pretrained UNet of Stable diffusion for your datasets（just modify the datapath and hyperparameter. The fine-turning VAE and DINO is not necessary, which depends on the difference between the pre-trained dataset and yours. You can determine whether to fine-tune VAE by comparing the difference between before and after processing images with VAE

Is your given stable_diffusion pre-trained model on the dataset of the paper? Would it be inappropriate if I switched to a medical dataset and then retrained unet using the stable_diffusion pre-trained model you were given

hyao1 commented 2 weeks ago

Hello, I'm still wondering, how is the pre-trained model used, is it used directly in the provided GitHub code? Do I need to retrain the pretrained model if I use a dataset that is not in the paper? Or use it directly

You should retrain the pretrained UNet of Stable diffusion for your datasets（just modify the datapath and hyperparameter. The fine-turning VAE and DINO is not necessary, which depends on the difference between the pre-trained dataset and yours. You can determine whether to fine-tune VAE by comparing the difference between before and after processing images with VAE

Is your given stable_diffusion pre-trained model on the dataset of the paper? Would it be inappropriate if I switched to a medical dataset and then retrained unet using the stable_diffusion pre-trained model you were given

I provide the pretrained stable diffusion in OneDrive (directly downloaded from huggface, without training by me), named "CompVis-stable-diffusion-v1-4(pretrained stable diffusion)". You can train your dataset base this pre-trained model. Other folders in OneDrive include the models trained by me, such as folder named "MVTec-AD" is model trained by me on MVTec-AD datasets. You don't need to use these models.

LeeHaoRanRan commented 2 weeks ago

Hello, I'm still wondering, how is the pre-trained model used, is it used directly in the provided GitHub code? Do I need to retrain the pretrained model if I use a dataset that is not in the paper? Or use it directly

You should retrain the pretrained UNet of Stable diffusion for your datasets（just modify the datapath and hyperparameter. The fine-turning VAE and DINO is not necessary, which depends on the difference between the pre-trained dataset and yours. You can determine whether to fine-tune VAE by comparing the difference between before and after processing images with VAE

Is your given stable_diffusion pre-trained model on the dataset of the paper? Would it be inappropriate if I switched to a medical dataset and then retrained unet using the stable_diffusion pre-trained model you were given

I provide the pretrained stable diffusion in OneDrive (directly downloaded from huggface, without training by me), named "CompVis-stable-diffusion-v1-4(pretrained stable diffusion)". You can train your dataset base this pre-trained model. Other folders in OneDrive include the models trained by me, such as folder named "MVTec-AD" is model trained by me on MVTec-AD datasets. You don't need to use these models.

Thanks again for your reply

LeeHaoRanRan commented 1 week ago

Hello, I would like to ask, what causes this problem

hyao1 commented 1 week ago

Hello, I would like to ask, what causes this problem

orginal code of stable diffusion pipeline includes a StableDiffusionSatetyChecker module to check whether the generated image contains sensitive information. We don't care about it and there is no influence for the experiments. So i delete related code for faster code execution. You don't have to worry about that.

LeeHaoRanRan commented 1 week ago

Hello, I would like to ask, what causes this problem

orginal code of stable diffusion pipeline includes a StableDiffusionSatetyChecker module to check whether the generated image contains sensitive information. We don't care about it and there is no influence for the experiments. So i delete related code for faster code execution. You don't have to worry about that.

Thanks a lot

hyao1 commented 1 week ago

Hello, we use 512x512 image in experiment. Even for 256x256 input images, we also upsample the input images into 512x512 to be consistent with pretained Stable Diffusion.

If your GPU memory is limited, please reduce batch size into 16, the generated images also are satisfying.

In addition， you can attempt to implement training model with mixed precision, which will further reduce required memory. I will attempt to implement recently.

I have implement training model with mixed precision. The memory requirement can drop to 27G from 39G with batch_size 32.

LeeHaoRanRan commented 1 week ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

我已经实现了混合精度的训练模型。内存要求可以从 39G 降至 27G，batch_size 32。

Ok, thanks for sharing，I also have a problem, in the industrial dataset is RGB three-channel image, and when I use the medical dataset, I need to use four different modal four-channel images, so the following problem arises, do you know what the solution is?And all four of my modalities are grayscale images

hyao1 commented 1 week ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

我已经实现了混合精度的训练模型。内存要求可以从 39G 降至 27G，batch_size 32。

Ok, thanks for sharing，I also have a problem, in the industrial dataset is RGB three-channel image, and when I use the medical dataset, I need to use four different modal four-channel images, so the following problem arises, do you know what the solution is?And all four of my modalities are grayscale images

This is a bit troublesome. Because of the pretrained stable diffusion and dino both is for 3 channels. Especially for DINO, we can not use the strong prior knowledge of pretrained parameter. If you want to train all model from scratch, you can try training a four-channel VAE by changing the original VAE input channel to 4 refer to DiffAD, and fin-tune Unet with my code. I can give the uncensored code of training VAE. If you need it, please give me you email.

LeeHaoRanRan commented 1 week ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

我已经实现了混合精度的训练模型。内存要求可以从 39G 降至 27G，batch_size 32。

好的，谢谢分享，我还有一个问题，在工业数据集中是RGB三通道图像，当我使用医疗数据集时，我需要使用四个不同的模态四通道图像，所以出现了以下问题，你知道解决方案是什么吗？我的四种模式都是灰度图像

这有点麻烦。由于预训练的稳定扩散和 dino 都是针对 3 个通道的。特别是对于DINO，我们不能使用预训练参数的强先验知识。如果你想从头开始训练所有模型，你可以尝试通过将原来的VAE输入通道更改为4来训练一个四通道的VAE，参考DiffAD，然后用我的代码微调Unet。我可以提供未经审查的训练 VAE 代码。如果您需要，请给我您的电子邮件。

My email address is：lihaoran0008@gmail.com

LeeHaoRanRan commented 1 week ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

我已经实现了混合精度的训练模型。内存要求可以从 39G 降至 27G，batch_size 32。

Ok, thanks for sharing，I also have a problem, in the industrial dataset is RGB three-channel image, and when I use the medical dataset, I need to use four different modal four-channel images, so the following problem arises, do you know what the solution is?And all four of my modalities are grayscale images

This is a bit troublesome. Because of the pretrained stable diffusion and dino both is for 3 channels. Especially for DINO, we can not use the strong prior knowledge of pretrained parameter. If you want to train all model from scratch, you can try training a four-channel VAE by changing the original VAE input channel to 4 refer to DiffAD, and fin-tune Unet with my code. I can give the uncensored code of training VAE. If you need it, please give me you email.

If I have a grayscale single-modality (single-channel) image, can I convert it to an RGB three-channel image and continue using your network?

hyao1 commented 1 week ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

我已经实现了混合精度的训练模型。内存要求可以从 39G 降至 27G，batch_size 32。

Ok, thanks for sharing，I also have a problem, in the industrial dataset is RGB three-channel image, and when I use the medical dataset, I need to use four different modal four-channel images, so the following problem arises, do you know what the solution is?And all four of my modalities are grayscale images

This is a bit troublesome. Because of the pretrained stable diffusion and dino both is for 3 channels. Especially for DINO, we can not use the strong prior knowledge of pretrained parameter. If you want to train all model from scratch, you can try training a four-channel VAE by changing the original VAE input channel to 4 refer to DiffAD, and fin-tune Unet with my code. I can give the uncensored code of training VAE. If you need it, please give me you email.

If I have a grayscale single-modality (single-channel) image, can I convert it to an RGB three-channel image and continue using your network?

Theoretically, you only need to copy the single channel into three channels to see how the generated results are first.

LeeHaoRanRan commented 1 week ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

我已经实现了混合精度的训练模型。内存要求可以从 39G 降至 27G，batch_size 32。

好的，谢谢分享，我也有一个问题，在工业数据集是RGB三通道图像，而当我使用医疗数据集时，我需要使用四种不同的模态四通道图像，所以出现了以下问题，你知道解决方案是什么吗？我的四种模态都是灰度图像

这有点麻烦。由于预训练的稳定扩散和 dino 都用于 3 个通道。特别是对于DINO，我们不能使用预训练参数的强大先验知识。如果你想从头开始训练所有模型，你可以尝试通过将原始 VAE 输入通道更改为 4 个引用 DiffAD，并使用我的代码对 Unet 进行 fin-tune 来训练一个四通道 VAC。我可以给出未经审查的训练 VAE 代码。如果您需要，请给我您的电子邮件。

如果我有灰度单模态（单通道）图像，是否可以将其转换为 RGB 三通道图像并继续使用您的网络？

从理论上讲，您只需要将单个通道复制到三个通道中，即可首先查看生成的结果。

I think I get it, I'll verify it

LeeHaoRanRan commented 6 days ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。

如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。

此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

Before testing and evaluating the model, I fine-tuned dino. The data size was 256*256 and batch was set to 1. I fine-tuned 2080ti and 4090 respectively, but the memory was not enough. The image below is the result on 4090

hyao1 commented 6 days ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

Before testing and evaluating the model, I fine-tuned dino. The data size was 256*256 and batch was set to 1. I fine-tuned 2080ti and 4090 respectively, but the memory was not enough. The image below is the result on 4090

Can you display your train hyperparameters? It should be printed in command line.

LeeHaoRanRan commented 6 days ago

您好，我们在实验中使用 512x512 图像。即使对于 256x256 的输入图像，我们也会将输入图像上采样为 512x512，以与保留的稳定扩散保持一致。如果您的 GPU 内存有限，请将批量大小减少到 16 个，生成的图像也令人满意。此外，您可以尝试实现混合精度的训练模型，这将进一步减少所需的内存。我最近将尝试实施。

在测试和评估模型之前，我对 dino 进行了微调。数据大小为 256*256，batch 设置为 1。我分别微调了 2080ti 和 4090，但内存不够。下图是 4090 的结果

你能显示你的火车超参数吗？它应该在命令行中打印。

Sorry, I forgot to fix it in the main function, thanks again for your reply。By the way, how long did it take you to fine-tune the dino model Is this time normal?

hyao1 / GLAD

GPU memory #3