RuntimeError: result type Double can't be cast to the desired output type Byte

JiBingdong commented 2 years ago

嗨您好！我在租用云端服务器时遇到了这个错误，服务器配置为（pytorch1.7，cuda11.0），百度这个问题发现应该是BCEloss的两个参数输入时的类型不一样，但是奇怪的地方是这个问题是在运行了大概第25个batch时出现的，也就是说最开始可以运行，而且代码在我本地主机上（pytorch1.6,cuda10.2)上可以正常运行，没有任何错误。由于云端服务器的系统镜像没办法更换到成我本地主机的环境配置，而且我也尝试了将labels_v改成float类型，但是报了别的错误--ValueError: only one element tensors can be converted to Python scalars，所以我现在很迷茫，不知道该如何去调试代码，如果您有遇到过相似问题的话，还请您不吝指教!!!

xuebinqin commented 2 years ago

Sorry about the issues. We didn't experience similar issues before. I guess the first error is probably caused by the data augmentation because it happens sometimes as you mentioned. The related uncertain parts are the random flipping and cropping, which probably produces some unstable results (low probability). But it also could be that the model produces some Nan values due to unreliable gradient descent. As for the loss part, you could convert the type of certain tensors inside each of the loss definitions.

On Wed, Jan 19, 2022 at 12:47 PM 小白 @.***> wrote:

嗨您好！我在租用云端服务器时遇到了这个错误，服务器配置为（pytorch1.7，cuda11.0），百度这个问题发现应该是BCEloss的两个参数输入时的类型不一样，但是奇怪的地方是这个问题是在运行了大概第25个batch时出现的，也就是说最开始可以运行，而且代码在我本地主机上（pytorch1.6,cuda10.2)上可以正常运行，没有任何错误。由于云端服务器的系统镜像没办法更换到成我本地主机的环境配置，而且我也尝试了将labels_v改成float类型，但是报了别的错误--ValueError: only one element tensors can be converted to Python scalars，所以我现在很迷茫，不知道该如何去调试代码，如果您有遇到过相似问题的话，还请您不吝指教!!!

— Reply to this email directly, view it on GitHub https://github.com/xuebinqin/U-2-Net/issues/285, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSGORJBGUYS4TI7IIP6GF3UWZ3CXANCNFSM5MJJWDKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Xuebin Qin PhD Department of Computing Science University of Alberta, Edmonton, AB, Canada Homepage: https://xuebinqin.github.io/

JiBingdong commented 2 years ago

谢谢您的回复，我相应排查了数据和代码之后发现问题应该不是出在这里，然后我想到在之前的一个问题中您回复过我源代码在pytorch1.8和pytorch1.6上测试过，于是我更换了云服务器的镜像配置，将其改成pytorch1.8之后发现可以正常训练了，所以最终问题应该是出在pytorch版本更迭发生的某些改动的变化中（虽然我也不知道具体问题是在哪。。）。 And 提前祝您新年快乐，虎年大吉！！！！

endh1337 commented 2 years ago

I also ran into that problem by having a dataset which contains labels without any mask. For example, if I take the DUTS-TR dataset and replace some masks which i don't want to be recognized in my custom case with a completely black (0) png.

I tried to create a custom dataset which contains pictures of birds an their ground truth binary masks (like DUTS-TR) and I got some nice results. The thing I want to achieve is, that only birds are detected and segmented, therefore I added some random pictures without birds and an equivalent mask file without any color > 0 and ended up with this "cast to byte" exception.

Is this a problem caused by my dataset or is your architecture not suited for label files without contents? If I leave all mask files entirely black, it worked strangely enough.

Would appreciate an answer :) Thanks for sharing this awesome project!

xu19971109 commented 2 years ago

I konw! I tried to modify the version of Torch, and this problem still exists in 1.11, 1.8.1, and 1.8.0..The problem arises because the data and model output float types do not match. Data in Dataloader is float64 and Loss is float 32. Try to modify the return value type of the call function of the "ToTensorLab" class.

imidx = torch.from_numpy(np.ascontiguousarray(imidx)).int() image = torch.from_numpy(np.ascontiguousarray(tmpImg)).float() label = torch.from_numpy(np.ascontiguousarray(tmpLbl)).float() return {'imidx':imidx, 'image': image, 'label': label}

DWCTOD commented 1 year ago

@xu19971109 你好，尝试了你的方案是可以跑起来的，但是最终的输出结果，好像反过来了，就是黑白颠倒了，前景和背景变了

xuebinqin / U-2-Net

RuntimeError: result type Double can't be cast to the desired output type Byte #285