koba-jon / pytorch_cpp

Deep Learning sample programs using PyTorch in C++
MIT License
241 stars 52 forks source link

Reading and writing an image / torch::tensor without OpenCV #6

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello, I saw your post on https://qiita.com/koba-jon/items/2b15865f5b4c0c9fbbf7

I am trying to read and write a tensor to and from a PNG using libpng without using OpenCV. See this: https://github.com/prabhuomkar/pytorch-cpp/issues/72

Do you think its possible?

Thanks

koba-jon commented 3 years ago

Hello. Thank you for providing an interesting discussion. I confirmed the "issue" of the link destination.

I have used "libpng" to read and write PNG images on my own. To be precise, I used "png++" that C++ interface of "libpng", to implement reading and writing PNG index images for semantic segmentation. (png++: official, doc)

The source code is here: https://github.com/koba-jon/pytorch_cpp/blob/master/utils/visualizer.cpp (Lines 175-204) https://github.com/koba-jon/pytorch_cpp/blob/master/utils/datasets.cpp (Lines 33-52)

However, this program assumes that the input or output is an index image. Are you assuming an 24-bit RGB or 8-bit grayscale image? I succeeded in reading and writing a 24-bit RGB image with the following code.

#include <iostream>
#include <torch/torch.h>
#include <png++/png.hpp>

torch::Tensor ConvertRGBintoTensor(png::image<png::rgb_pixel> &image);
png::image<png::rgb_pixel> ConvertTensorintoRGB(torch::Tensor &tensor_);

int main(void){

    // Input PNG-image
    png::image<png::rgb_pixel> imageI("input.png");

    // Convert png::image into torch::Tensor
    torch::Tensor tensor = ConvertRGBintoTensor(imageI);
    std::cout << "C:" << tensor.size(0) << " H:" << tensor.size(1) << " W:" << tensor.size(2) << std::endl;

    // Convert torch::Tensor into png::image
    png::image<png::rgb_pixel> imageO = ConvertTensorintoRGB(tensor);

    // Output PNG-image
    imageO.write("output.png");

    return 0;
}

torch::Tensor ConvertRGBintoTensor(png::image<png::rgb_pixel> &image){
    size_t width = image.get_width();
    size_t height = image.get_height();
    unsigned char *pointer = new unsigned char[width * height * 3];
    for (size_t j = 0; j < height; j++){
        for (size_t i = 0; i < width; i++){
            pointer[j * width * 3 + i * 3 + 0] = image[j][i].red;
            pointer[j * width * 3 + i * 3 + 1] = image[j][i].green;
            pointer[j * width * 3 + i * 3 + 2] = image[j][i].blue;
        }
    }
    torch::Tensor tensor = torch::from_blob(pointer, {image.get_height(), image.get_width(), 3}, torch::kUInt8).clone();  // copy
    tensor = tensor.permute({2, 0, 1});  // {H,W,C} ===> {C,H,W}
    delete[] pointer;
    return tensor;
}

png::image<png::rgb_pixel> ConvertTensorintoRGB(torch::Tensor &tensor_){
    torch::Tensor tensor = tensor_.permute({1, 2, 0});  // {C,H,W} ===> {H,W,C}
    size_t width = tensor.size(1);
    size_t height = tensor.size(0);
    unsigned char *pointer = tensor.data_ptr<unsigned char>();
    png::image<png::rgb_pixel> image(width, height);
    for (size_t j = 0; j < height; j++){
        for (size_t i = 0; i < width; i++){
            image[j][i].red = pointer[j * width * 3 + i * 3 + 0];
            image[j][i].green = pointer[j * width * 3 + i * 3 + 1];
            image[j][i].blue = pointer[j * width * 3 + i * 3 + 2];
        }
    }
    return image;
}
ghost commented 3 years ago

Thank you so much for your response! I am going to test this and get back to you, you really saved my day :)

ghost commented 3 years ago

Based on your code, I wrote a small lib for it, you are fully credited of course: https://github.com/QuantScientist/PngTorch

koba-jon commented 3 years ago

I understand the above. I'm glad that I could help you out. I wish good luck to you.

ghost commented 3 years ago

Thank you once again,

There is a strange thing happening when saving the pics which I did not notice at the beginning:

screenshot2 siv3d-kun-output001

And another one:

siv3d-kun

siv3d-kun-output001

Breaking my head all day on this, Thanks,

koba-jon commented 3 years ago

Does that mean that if you read the PNG image above, convert it to a tensor, then convert it back to the PNG image and write it, the image below will be output? That's because the original image is an RGBA image, not an RGB image. This image has one transparent channel in addition to three color component channels. In the case of the example image, the transparent part corresponds to the periphery of the image.

Therefore, if your software is reading this image as an RGB image, it is possible that the transparent channel of the original image has been lost and the transparent part has turned black. I think this can be solved by setting this image to be read as an RGBA channel when using png++.

koba-jon commented 3 years ago

This problem can be solved by simply adding processing for the alpha channel to the original source code. However, I think RGBA images are only used in limited situations.

#include <iostream>
#include <torch/torch.h>
#include <png++/png.hpp>

torch::Tensor ConvertRGBAintoTensor(png::image<png::rgba_pixel> &image);
png::image<png::rgba_pixel> ConvertTensorintoRGBA(torch::Tensor &tensor_);

int main(void){

    // Input PNG-image
    png::image<png::rgba_pixel> imageI("input.png");

    // Convert png::image into torch::Tensor
    torch::Tensor tensor = ConvertRGBAintoTensor(imageI);
    std::cout << "C:" << tensor.size(0) << " H:" << tensor.size(1) << " W:" << tensor.size(2) << std::endl;

    // Convert torch::Tensor into png::image
    png::image<png::rgba_pixel> imageO = ConvertTensorintoRGBA(tensor);

    // Output PNG-image
    imageO.write("output.png");

    return 0;
}

torch::Tensor ConvertRGBAintoTensor(png::image<png::rgba_pixel> &image){
    size_t width = image.get_width();
    size_t height = image.get_height();
    unsigned char *pointer = new unsigned char[width * height * 4];
    for (size_t j = 0; j < height; j++){
        for (size_t i = 0; i < width; i++){
            pointer[j * width * 4 + i * 4 + 0] = image[j][i].red;
            pointer[j * width * 4 + i * 4 + 1] = image[j][i].green;
            pointer[j * width * 4 + i * 4 + 2] = image[j][i].blue;
            pointer[j * width * 4 + i * 4 + 3] = image[j][i].alpha;
        }
    }
    torch::Tensor tensor = torch::from_blob(pointer, {image.get_height(), image.get_width(), 4}, torch::kUInt8).clone();  // copy
    tensor = tensor.permute({2, 0, 1});  // {H,W,C} ===> {C,H,W}
    delete[] pointer;
    return tensor;
}

png::image<png::rgba_pixel> ConvertTensorintoRGBA(torch::Tensor &tensor_){
    torch::Tensor tensor = tensor_.permute({1, 2, 0});  // {C,H,W} ===> {H,W,C}
    size_t width = tensor.size(1);
    size_t height = tensor.size(0);
    unsigned char *pointer = tensor.data_ptr<unsigned char>();
    png::image<png::rgba_pixel> image(width, height);
    for (size_t j = 0; j < height; j++){
        for (size_t i = 0; i < width; i++){
            image[j][i].red = pointer[j * width * 4 + i * 4 + 0];
            image[j][i].green = pointer[j * width * 4 + i * 4 + 1];
            image[j][i].blue = pointer[j * width * 4 + i * 4 + 2];
            image[j][i].alpha = pointer[j * width * 4 + i * 4 + 3];
        }
    }
    return image;
}
ghost commented 3 years ago

Once again, thank you so much for taking a look. I updated the repo here: https://github.com/QuantScientist/PngTorch/blob/master/src/example002.cpp

After loading an image (RGB or RGBA) I run the tensor through a trained Neural style pytorch model ( see https://github.com/QuantScientist/PngTorch/tree/master/resources for the *.pt files)

However, when saving back the returned tensor as a PNG, 9 small squares appear (e.g 3*3) of the image instead of one single sized image the corresponds to the original image. I thought it has something to do with kUInt8 vs kFlaot32 but this is not the issue. For instance for this input:
https://github.com/QuantScientist/PngTorch/blob/master/resources/windmill.png image

This is the resulting output: https://github.com/QuantScientist/PngTorch/blob/master/resources/windmill.png-out.png image

I am 100% positive that the model itself is working well since I tested it with OpenCV on a video.

Thanks again,

ghost commented 3 years ago

From my old opencv conversion method, i saw I did this:

out_tensor = out_tensor.mul(255).clamp(0, 255).to(torch::kU8);

I added this to the PNG conversion method:

png::image<png::rgb_pixel> VisionUtils::torchToPng(torch::Tensor &tensor_){

    torch::Tensor tensor = tensor_.squeeze().detach().cpu().permute({1, 2, 0});  // {C,H,W} ===> {H,W,C}
    tensor = tensor.mul(255).clamp(0, 255).to(torch::kU8);

    size_t width = tensor.size(1);
    size_t height = tensor.size(0);
    auto pointer = tensor.data_ptr<unsigned char>();
    png::image<png::rgb_pixel> image(width, height);
    for (size_t j = 0; j < height; j++){
        for (size_t i = 0; i < width; i++){
            image[j][i].red = pointer[j * width * 3 + i * 3 + 0];
            image[j][i].green = pointer[j * width * 3 + i * 3 + 1];
            image[j][i].blue = pointer[j * width * 3 + i * 3 + 2];
        }
    }
    return image;
}

An the result is better although the colors are completely wrong, over-saturated. I think it has to do with the size of k8 vs kfloat and the range between 0-255 and 0 to 1.

windmill png-out

koba-jon commented 3 years ago

Do you know what range the output of this model can originally take? If the output of the model is restricted from 0 to 1 by sigmoid function, I don't think it's wrong.

Looking at recent image generation models, I often feel that the output of the model is restricted from -1 to 1 by tanh function.

out_tensor = out_tensor.mul(255).clamp(0, 255).to(torch::kU8);

It's completely my guess, but what if you change the above code to: out_tensor = out_tensor.mul(0.5).add(0.5).mul(255).clamp(0, 255).to(torch::kU8);

ghost commented 3 years ago

That is possible, this is the result, almost fully white: windmill png-out

ghost commented 3 years ago

This is the original NN:

import torch
class TransformerNet(torch.nn.Module):
    def __init__(self):
        super(TransformerNet, self).__init__()
        # Initial convolution layers
        self.conv1 = ConvLayer(3, 32, kernel_size=9, stride=1)
        self.in1 = torch.nn.InstanceNorm2d(32, affine=True)
        self.conv2 = ConvLayer(32, 64, kernel_size=3, stride=2)
        self.in2 = torch.nn.InstanceNorm2d(64, affine=True)
        self.conv3 = ConvLayer(64, 128, kernel_size=3, stride=2)
        self.in3 = torch.nn.InstanceNorm2d(128, affine=True)
        # Residual layers
        self.res1 = ResidualBlock(128)
        self.res2 = ResidualBlock(128)
        self.res3 = ResidualBlock(128)
        self.res4 = ResidualBlock(128)
        self.res5 = ResidualBlock(128)
        # Upsampling Layers
        self.deconv1 = UpsampleConvLayer(128, 64, kernel_size=3, stride=1, upsample=2)
        self.in4 = torch.nn.InstanceNorm2d(64, affine=True)
        self.deconv2 = UpsampleConvLayer(64, 32, kernel_size=3, stride=1, upsample=2)
        self.in5 = torch.nn.InstanceNorm2d(32, affine=True)
        self.deconv3 = ConvLayer(32, 3, kernel_size=9, stride=1)
        # Non-linearities
        self.relu = torch.nn.ReLU()

    def forward(self, X):
        y = self.relu(self.in1(self.conv1(X)))
        y = self.relu(self.in2(self.conv2(y)))
        y = self.relu(self.in3(self.conv3(y)))
        y = self.res1(y)
        y = self.res2(y)
        y = self.res3(y)
        y = self.res4(y)
        y = self.res5(y)
        y = self.relu(self.in4(self.deconv1(y)))
        y = self.relu(self.in5(self.deconv2(y)))
        y = self.deconv3(y)
        return y

class ConvLayer(torch.nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        super(ConvLayer, self).__init__()
        reflection_padding = kernel_size // 2
        self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding)
        self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride)

    def forward(self, x):
        out = self.reflection_pad(x)
        out = self.conv2d(out)
        return out

class ResidualBlock(torch.nn.Module):
    """ResidualBlock
    introduced in: https://arxiv.org/abs/1512.03385
    recommended architecture: http://torch.ch/blog/2016/02/04/resnets.html
    """

    def __init__(self, channels):
        super(ResidualBlock, self).__init__()
        self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1)
        self.in1 = torch.nn.InstanceNorm2d(channels, affine=True)
        self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1)
        self.in2 = torch.nn.InstanceNorm2d(channels, affine=True)
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        residual = x
        out = self.relu(self.in1(self.conv1(x)))
        out = self.in2(self.conv2(out))
        out = out + residual
        return out

class UpsampleConvLayer(torch.nn.Module):
    """UpsampleConvLayer
    Upsamples the input and then does a convolution. This method gives better results
    compared to ConvTranspose2d.
    ref: http://distill.pub/2016/deconv-checkerboard/
    """

    def __init__(self, in_channels, out_channels, kernel_size, stride, upsample=None):
        super(UpsampleConvLayer, self).__init__()
        self.upsample = upsample
        reflection_padding = kernel_size // 2
        self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding)
        self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride)

    def forward(self, x):
        x_in = x
        if self.upsample:
            x_in = torch.nn.functional.interpolate(x_in, mode='nearest', scale_factor=self.upsample)
        out = self.reflection_pad(x_in)
        out = self.conv2d(out)
        return out 

Which I exported like so:

import torch
import torchvision
import re
import torch
import os
import random

from models.neural import *

style_model = TransformerNet()
state_dict = torch.load("models/weights/neural/candy.pth")
for k in list(state_dict.keys()):
    if re.search(r'in\d+\.running_(mean|var)$', k):
        del state_dict[k]
style_model.load_state_dict(state_dict)
style_model.eval()

# model = torchvision.models.resnet50(pretrained=True)
# model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(style_model, example)
traced_script_module.save("style_model_cpp.pt") 
ghost commented 3 years ago

And the original results are here: https://github.com/gnsmrky/pytorch-fast-neural-style-for-web

koba-jon commented 3 years ago

I saw the code. There is no trace that the author is using the tanh or sigmoid functions, so I somehow understand why the sample images have artifacts.

And, when I looked at the "save_image" function in the code below, I found that the output image was clamped without range restriction or normalization. Therefore, I think that it is not necessary to change the scale by multiplying 255. https://github.com/gnsmrky/pytorch-fast-neural-style-for-web/blob/master/neural_style/utils.py

Also, I checked the following code again. https://github.com/QuantScientist/PngTorch/blob/master/src/example002.cpp

About the 30th line, I think that casting with "torch::kU8" before clamping the output image may also be the cause of the artifact. Has this location been changed? out_tensor = out_tensor.to(torch::kU8).detach().cpu().squeeze(); //Remove batch dim, must convert back to torch::kU8

ghost commented 3 years ago

Thank you so much for looking again. I also opened a thread here with the full code: https://discuss.pytorch.org/t/help-with-a-tracing-neuralstyle-transfer-model-to-c/97120

Removing the cast as you suggested results in: amber png_candy_cpp pt-out

koba-jon commented 3 years ago

I can't say that it has been solved yet, but I got interesting results. It found that the negative-positive converted image for the output image of the model is similar to the original result. I hope this result will lead to debugging.

Do you have any idea about this result?

ghost commented 3 years ago

Looks good by "negative-positive converted" you mean by Phtoshop or in c++?

ghost commented 3 years ago
 for (size_t j = 0; j < height; j++){
        for (size_t i = 0; i < width; i++){
            image[j][i].red = 255- pointer[j * width * 3 + i * 3 + 0];
            image[j][i].green = 255- pointer[j * width * 3 + i * 3 + 1];
            image[j][i].blue = 255 - pointer[j * width * 3 + i * 3 + 2];
        }
    }

But if you zoom in, you will see very small highly-saturated pixels.

image

koba-jon commented 3 years ago

My source code for negative / positive conversion is exactly the same as you.

ghost commented 3 years ago

Now the regular save PNG without inference looks like:

windmill-out

This is without the 255 -, I just wrote, it is a result of my previous changes.

ghost commented 3 years ago

This line: tensor = tensor.mul(255).clamp(0, 255).to(torch::kU8);

Turns the PNG into negative. Just load a png into tensor and save it, no CNN is involved.

ghost commented 3 years ago

But without tensor = tensor.mul(255).clamp(0, 255).to(torch::kU8); I get nine cubes ....

amber png_mosaic_cpp pt-out

koba-jon commented 3 years ago

I have a Windows 10 PC without NVIDIA GPU. Is it possible for me to reproduce the same situation as you?

ghost commented 3 years ago

Of course, In the cmake file https://github.com/QuantScientist/PngTorch/blob/master/cmake_find/download_libtorch.cmake : change: https://github.com/QuantScientist/PngTorch/blob/210ecca72745d816cef5b571d8b95ed96f28ba2c/cmake_find/download_libtorch.cmake#L6

set(CUDA_V "10.2")
set(LIBTORCH_DEVICE "cu102")

To:

set(CUDA_V "cpu")
set(LIBTORCH_DEVICE "cpu")

It will download libtorch automatically for you.

Whenever you see torch::Device device(torch::kCUDA);

Change it to : torch::Device device(torch::kCPU);

Thanks!

ghost commented 3 years ago

I think I got it!

Note the us of torch::kFloat32 here: out_tensor = out_tensor.to(torch::kFloat32).detach().cpu().squeeze(); //Remove batch dim, must convert back to torch::kU8

And here:

png::image<png::rgb_pixel> VisionUtils::torchToPng(torch::Tensor &tensor_){
    torch::Tensor tensor = tensor_.squeeze().detach().cpu().permute({1, 2, 0});  // {C,H,W} ===> {H,W,C}
    tensor = tensor.clamp(0, 255);
    tensor = tensor.to(torch::kU8);
    size_t width = tensor.size(1);
    size_t height = tensor.size(0);
    auto pointer = tensor.data_ptr<unsigned char>();
    png::image<png::rgb_pixel> image(width, height);
    for (size_t j = 0; j < height; j++){
        for (size_t i = 0; i < width; i++){
            image[j][i].red = pointer[j * width * 3 + i * 3 + 0];
            image[j][i].green = pointer[j * width * 3 + i * 3 + 1];
            image[j][i].blue = pointer[j * width * 3 + i * 3 + 2];
        }
    }
    return image;
}
koba-jon commented 3 years ago

That's great to hear!

ghost commented 3 years ago

I have no clue why this works!

koba-jon commented 3 years ago

I found the perfect solution when I used Deep Learning programs using OpenCV and LibTorch in Windows 10. We need to use function "contiguous()" just before we convert "torch::Tensor" into another type. It is necessary to sort the data in memory in appropriate order by using "contiguous()".

tensor = tensor.contiguous();

// For OpenCV
cv::Mat imageMat = cv::Mat(cv::Size(tensor.size(1), tensor.size(0)), CV_8UC3, tensor.data_ptr<unsigned char>())

// For PNG++
png::image<png::rgb_pixel> imagePNG(tensor.size(1), tensor.size(0));
auto pointer = tensor.data_ptr<unsigned char>();
for (size_t j = 0; j < height; j++){
    for (size_t i = 0; i < width; i++){
        imagePNG[j][i].red = pointer[j * width * 3 + i * 3 + 0];
        imagePNG[j][i].green = pointer[j * width * 3 + i * 3 + 1];
        imagePNG[j][i].blue = pointer[j * width * 3 + i * 3 + 2];
    }
}

Thank you!!