I trained the model with CASIA 2.0, but the result is strange, looks like outfit? Could you please give me some advices?

yannier912 commented 2 years ago

Dataset: CASIA 2.0, total 4883 images. batch_size=32 lr = 0.1 in epoch 0-49, lr = 0.01 in epoch 50-99. imagesize=384(w)*256(h) I also caculate the val-loss using cross-entropy as same as train-loss. The train-loss, val-loss and val-dice like this: Training Process for lr-0 1 I have tried the predict by test data, and result is bad. Could you please give me some advices? Look forward to your reply!

yelusaleng commented 2 years ago

the case is caused by overfitting, you should expand your training set.

yannier912 commented 2 years ago

the case is caused by overfitting, you should expand your training set.

thank you for the reply! I will try the data augmentation using script issues#9. Is that enough for data CASIA 2.0 ?

yelusaleng commented 2 years ago

i think data augmentation is not enough, you still need to collect more data (probably 40k data at least).

yelusaleng commented 2 years ago

but augmentation will bring some advance indeed.

yannier912 commented 2 years ago

but augmentation will bring some advance indeed.

i will try to collect more data, thank you for your advice!!

yannier912 commented 2 years ago

i think data augmentation is not enough, you still need to collect more data (probably 40k data at least).

how long did it take when you train? i use about 55k data, it take 3.5 hours with 1 epoch. could you please share the epochs, training time and numbers of data in your train? thank you!!

yelusaleng commented 2 years ago

are you Chinese? 是国人么，直接中文对话吧。你的训练时间肯定是不对的，即使55k数据，也不可能3.5一个epoch。你检查一下你的数据读取是否正确，batchsize,num_wokers这些参数需要调整，训练时间的话，我记不太清楚了，猜测1个epoch不会超过半个小时，最多不会超过1个小时。

yannier912 commented 2 years ago

are you Chinese? 是国人么，直接中文对话吧。你的训练时间肯定是不对的，即使55k数据，也不可能3.5一个epoch。你检查一下你的数据读取是否正确，batchsize,num_wokers这些参数需要调整，训练时间的话，我记不太清楚了，猜测1个epoch不会超过半个小时，最多不会超过1个小时。

是的是的～ batchsize=32，图像大小384*256，num_wokers是指DataLoader吗，我直接用您的代码加载数据没有用到torch的DataLoader，请问您说的num_wokers是在哪里设置，设置多少合适呢？还有其他建议吗～～太感谢了呀！！另外我数据集加了无篡改数据2w，模型加了pixcel到image的计算，然后loss也加入了image loss权重0.2。

yelusaleng commented 2 years ago

batchsize的话在你显存范围内能上多大上多大（不超过256），可能我当时写的代码没有用dataloader，但是一般来说dataloader优化更好，num_workers好像就是dataloader这一块设置的，一般4比较好。

至于你说的image loss的话，你是借鉴的mvss-net的思想么，我不太清楚效果如何，你可以试试

yannier912 commented 2 years ago

batchsize的话在你显存范围内能上多大上多大（不超过256），可能我当时写的代码没有用dataloader，但是一般来说dataloader优化更好，num_workers好像就是dataloader这一块设置的，一般4比较好。

至于你说的image loss的话，你是借鉴的mvss-net的思想么，我不太清楚效果如何，你可以试试

嗯嗯image loss借鉴的mvssnet～那我把数据加载改成dataloader试下，谢啦！

yannier912 commented 2 years ago

batchsize的话在你显存范围内能上多大上多大（不超过256），可能我当时写的代码没有用dataloader，但是一般来说dataloader优化更好，num_workers好像就是dataloader这一块设置的，一般4比较好。

至于你说的image loss的话，你是借鉴的mvss-net的思想么，我不太清楚效果如何，你可以试试

您好，再跟您确认个问题，数据加载过程中对img做了/255的操作在imgs_normalized = map(normalize, imgs_switched)，mask没有/255，所以mask已经处理为只有0和1的取值而不是0和255，是吗？

yelusaleng commented 2 years ago

在train.py的66行处有对mask进行/255操作哦

yannier912 commented 2 years ago

在train.py的66行处有对mask进行/255操作哦

哦哦看到啦！！谢谢～～

yannier912 commented 2 years ago

作者您好，我用约5.8w篡改图像+2w无篡改作为训练样本，沿用了您左右crop的方式数据扩了2倍，按道理这个量应该差不多了但仍貌似过拟合状态。从epoch-1的step-1的pixel_loss: 0.8289, image_loss: 0.6931，到epoch-1结束train平均Pixel Loss: 0.0060, Image Loss: 0.6933，Validation Dice Coeff: 0.9391, Pixel Loss: 0.0048。再后来loss就没怎么降，一直到epoch-15的train平均Pixel Loss: 0.0061, Image Loss: 0.5186，Validation Dice Coeff: 0.9391, Pixel Loss: 0.0063。

batchsize仍然32，lr初始0.01指数衰减（设置0.1会存在预测结果nan导致loss无法计算），loss=pixel loss 0.8 + image loss 0.2

我用epoch15的模型预测了一下，预测结果全都是无篡改。不知道您能否给一些建议呢。。。非常感谢！！

yelusaleng commented 2 years ago

如果你方便的话可以把代码以及你的epoch15发给我帮你看看

shenyanni @.***> 于2022年4月2日周六 14:52写道：

作者您好，我用约5.8w篡改图像+2w无篡改作为训练样本，沿用了您左右crop的方式数据扩了2倍，按道理这个量应该差不多了但仍貌似过拟合状态。 batchsize仍然32，loss=pixel loss 0.8 + image loss 0.2 从epoch-1的step-1的pixel_loss: 0.8289, image_loss: 0.6931，到epoch-1结束train平均Pixel Loss: 0.0060, Image Loss: 0.6933，Validation Dice Coeff: 0.9391, Pixel Loss: 0.0048。再后来loss就没怎么降，一直到epoch-15的train平均Pixel Loss: 0.0061, Image Loss: 0.5186，Validation Dice Coeff: 0.9391, Pixel Loss: 0.0063。我用epoch15的模型预测了一下，预测结果全都是无篡改。不知道您能否给一些建议呢。。。非常感谢！！

— Reply to this email directly, view it on GitHub https://github.com/yelusaleng/RRU-Net/issues/21#issuecomment-1086567855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHKD5OEE53SLUJCHLRULAJLVC7VDTANCNFSM5RZ4S6LA . You are receiving this because you commented.Message ID: @.***>

yelusaleng commented 2 years ago

按理来说，验证的dc这么高不该测不出来东西

shenyanni @.***> 于2022年4月2日周六 14:52写道：

作者您好，我用约5.8w篡改图像+2w无篡改作为训练样本，沿用了您左右crop的方式数据扩了2倍，按道理这个量应该差不多了但仍貌似过拟合状态。 batchsize仍然32，loss=pixel loss 0.8 + image loss 0.2 从epoch-1的step-1的pixel_loss: 0.8289, image_loss: 0.6931，到epoch-1结束train平均Pixel Loss: 0.0060, Image Loss: 0.6933，Validation Dice Coeff: 0.9391, Pixel Loss: 0.0048。再后来loss就没怎么降，一直到epoch-15的train平均Pixel Loss: 0.0061, Image Loss: 0.5186，Validation Dice Coeff: 0.9391, Pixel Loss: 0.0063。我用epoch15的模型预测了一下，预测结果全都是无篡改。不知道您能否给一些建议呢。。。非常感谢！！

— Reply to this email directly, view it on GitHub https://github.com/yelusaleng/RRU-Net/issues/21#issuecomment-1086567855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHKD5OEE53SLUJCHLRULAJLVC7VDTANCNFSM5RZ4S6LA . You are receiving this because you commented.Message ID: @.***>

yannier912 commented 2 years ago

如果你方便的话可以把代码以及你的epoch15发给我帮你看看 shenyanni @.> 于2022年4月2日周六 14:52写道： … 作者您好，我用约5.8w篡改图像+2w无篡改作为训练样本，沿用了您左右crop的方式数据扩了2倍，按道理这个量应该差不多了但仍貌似过拟合状态。 batchsize仍然32，loss=pixel loss 0.8 + image loss 0.2 从epoch-1的step-1的pixel_loss: 0.8289, image_loss: 0.6931，到epoch-1结束train平均Pixel Loss: 0.0060, Image Loss: 0.6933，Validation Dice Coeff: 0.9391, Pixel Loss: 0.0048。再后来loss就没怎么降，一直到epoch-15的train平均Pixel Loss: 0.0061, Image Loss: 0.5186，Validation Dice Coeff: 0.9391, Pixel Loss: 0.0063。我用epoch15的模型预测了一下，预测结果全都是无篡改。不知道您能否给一些建议呢。。。非常感谢！！ — Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHKD5OEE53SLUJCHLRULAJLVC7VDTANCNFSM5RZ4S6LA . You are receiving this because you commented.Message ID: @.>

嗯嗯太感谢了，您方便留个邮箱吗，或者什么方式，我打包一下发您

yelusaleng commented 2 years ago

yalesaleng@gmail.com

yannier912 commented 2 years ago

yalesaleng@gmail.com

嗯嗯已发，请查收～非常非常感谢！

yelusaleng commented 2 years ago

没收到啊

yelusaleng commented 2 years ago

OK，收到了

yelusaleng / RRU-Net

I trained the model with CASIA 2.0, but the result is strange, looks like outfit? Could you please give me some advices? #21