请问这个模型需要训练多久？

sjtuplayer / anomalydiffusion

[AAAI 2024] AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

MIT License

114 stars 14 forks source link

请问这个模型需要训练多久？ #27

Open JYing-where opened 4 months ago

JYing-where commented 4 months ago

anomaly generation model和mask generation model 单张卡训练起来每个需要多久呢？

sjtuplayer commented 4 months ago

在单张4090上，anomaly generation model大概需要3-5天，mask generation model每个缺陷类别需要大约3-5个小时

JYing-where commented 4 months ago

请问这个代码在训练中断的情况下可以继续训练吗？我使用--resume，但是无法保证他是否是继续训练的

sjtuplayer commented 4 months ago

你可能需要手动load一下spatial_encoder.pt和embedding.pt

bat1115 commented 4 months ago

在单张4090上，anomaly generation model大概需要3-5天，mask generation model每个缺陷类别需要大约3-5个小时

你好，请问能否使用多卡训练，如果可以的话如何使用多卡进行训练呢，我使用多卡的时候老是卡死。下面是我的运行代码： CUDA_VISIBLE_DEVICES=0,1 python main.py --spatial_encoder_embedding --data_enhance --base configs/latent-diffusion/txt2img-1p4B-finetune-encoder+embedding.yaml -t --actual_resume models/ldm/text2img-large/model.ckpt -n test --gpus 0,1 --init_word anomaly --mvtec_path=$path_to_mvtec_dataset

sjtuplayer commented 4 months ago

在单张4090上，anomaly generation model大概需要3-5天，mask generation model每个缺陷类别需要大约3-5个小时

你好，请问能否使用多卡训练，如果可以的话如何使用多卡进行训练呢，我使用多卡的时候老是卡死。下面是我的运行代码： CUDA_VISIBLE_DEVICES=0,1 python main.py --spatial_encoder_embedding --data_enhance --base configs/latent-diffusion/txt2img-1p4B-finetune-encoder+embedding.yaml -t --actual_resume models/ldm/text2img-large/model.ckpt -n test --gpus 0,1 --init_word anomaly --mvtec_path=$path_to_mvtec_dataset

你好，多卡不一定能用，因为代码主体来自textual inversion，一般都是用的单卡，所以建议使用单卡训练，对于多个anomaly mask的训练和生成，可以使用不同的卡跑不同的类别

JYing-where commented 4 months ago

我在训练train_mask.py 时遇到一个问题，配置文件中 trainer: max_steps: 5000，但是在运行时，其steps明显超过5000了，如下请问这是怎么回事呢？

sjtuplayer commented 3 months ago

我在训练train_mask.py 时遇到一个问题，配置文件中 trainer: max_steps: 5000，但是在运行时，其steps明显超过5000了，如下请问这是怎么回事呢？

mask checkpoints标准训练3w iteration，在training代码中手动做了调整,max_step=30000