Closed ghost closed 3 years ago
Hello. Thank you for providing an interesting discussion. I confirmed the "issue" of the link destination.
I have used "libpng" to read and write PNG images on my own. To be precise, I used "png++" that C++ interface of "libpng", to implement reading and writing PNG index images for semantic segmentation. (png++: official, doc)
The source code is here: https://github.com/koba-jon/pytorch_cpp/blob/master/utils/visualizer.cpp (Lines 175-204) https://github.com/koba-jon/pytorch_cpp/blob/master/utils/datasets.cpp (Lines 33-52)
However, this program assumes that the input or output is an index image. Are you assuming an 24-bit RGB or 8-bit grayscale image? I succeeded in reading and writing a 24-bit RGB image with the following code.
#include <iostream>
#include <torch/torch.h>
#include <png++/png.hpp>
torch::Tensor ConvertRGBintoTensor(png::image<png::rgb_pixel> &image);
png::image<png::rgb_pixel> ConvertTensorintoRGB(torch::Tensor &tensor_);
int main(void){
// Input PNG-image
png::image<png::rgb_pixel> imageI("input.png");
// Convert png::image into torch::Tensor
torch::Tensor tensor = ConvertRGBintoTensor(imageI);
std::cout << "C:" << tensor.size(0) << " H:" << tensor.size(1) << " W:" << tensor.size(2) << std::endl;
// Convert torch::Tensor into png::image
png::image<png::rgb_pixel> imageO = ConvertTensorintoRGB(tensor);
// Output PNG-image
imageO.write("output.png");
return 0;
}
torch::Tensor ConvertRGBintoTensor(png::image<png::rgb_pixel> &image){
size_t width = image.get_width();
size_t height = image.get_height();
unsigned char *pointer = new unsigned char[width * height * 3];
for (size_t j = 0; j < height; j++){
for (size_t i = 0; i < width; i++){
pointer[j * width * 3 + i * 3 + 0] = image[j][i].red;
pointer[j * width * 3 + i * 3 + 1] = image[j][i].green;
pointer[j * width * 3 + i * 3 + 2] = image[j][i].blue;
}
}
torch::Tensor tensor = torch::from_blob(pointer, {image.get_height(), image.get_width(), 3}, torch::kUInt8).clone(); // copy
tensor = tensor.permute({2, 0, 1}); // {H,W,C} ===> {C,H,W}
delete[] pointer;
return tensor;
}
png::image<png::rgb_pixel> ConvertTensorintoRGB(torch::Tensor &tensor_){
torch::Tensor tensor = tensor_.permute({1, 2, 0}); // {C,H,W} ===> {H,W,C}
size_t width = tensor.size(1);
size_t height = tensor.size(0);
unsigned char *pointer = tensor.data_ptr<unsigned char>();
png::image<png::rgb_pixel> image(width, height);
for (size_t j = 0; j < height; j++){
for (size_t i = 0; i < width; i++){
image[j][i].red = pointer[j * width * 3 + i * 3 + 0];
image[j][i].green = pointer[j * width * 3 + i * 3 + 1];
image[j][i].blue = pointer[j * width * 3 + i * 3 + 2];
}
}
return image;
}
Thank you so much for your response! I am going to test this and get back to you, you really saved my day :)
Based on your code, I wrote a small lib for it, you are fully credited of course: https://github.com/QuantScientist/PngTorch
I understand the above. I'm glad that I could help you out. I wish good luck to you.
Thank you once again,
There is a strange thing happening when saving the pics which I did not notice at the beginning:
And another one:
Breaking my head all day on this, Thanks,
Does that mean that if you read the PNG image above, convert it to a tensor, then convert it back to the PNG image and write it, the image below will be output? That's because the original image is an RGBA image, not an RGB image. This image has one transparent channel in addition to three color component channels. In the case of the example image, the transparent part corresponds to the periphery of the image.
Therefore, if your software is reading this image as an RGB image, it is possible that the transparent channel of the original image has been lost and the transparent part has turned black. I think this can be solved by setting this image to be read as an RGBA channel when using png++.
This problem can be solved by simply adding processing for the alpha channel to the original source code. However, I think RGBA images are only used in limited situations.
#include <iostream>
#include <torch/torch.h>
#include <png++/png.hpp>
torch::Tensor ConvertRGBAintoTensor(png::image<png::rgba_pixel> &image);
png::image<png::rgba_pixel> ConvertTensorintoRGBA(torch::Tensor &tensor_);
int main(void){
// Input PNG-image
png::image<png::rgba_pixel> imageI("input.png");
// Convert png::image into torch::Tensor
torch::Tensor tensor = ConvertRGBAintoTensor(imageI);
std::cout << "C:" << tensor.size(0) << " H:" << tensor.size(1) << " W:" << tensor.size(2) << std::endl;
// Convert torch::Tensor into png::image
png::image<png::rgba_pixel> imageO = ConvertTensorintoRGBA(tensor);
// Output PNG-image
imageO.write("output.png");
return 0;
}
torch::Tensor ConvertRGBAintoTensor(png::image<png::rgba_pixel> &image){
size_t width = image.get_width();
size_t height = image.get_height();
unsigned char *pointer = new unsigned char[width * height * 4];
for (size_t j = 0; j < height; j++){
for (size_t i = 0; i < width; i++){
pointer[j * width * 4 + i * 4 + 0] = image[j][i].red;
pointer[j * width * 4 + i * 4 + 1] = image[j][i].green;
pointer[j * width * 4 + i * 4 + 2] = image[j][i].blue;
pointer[j * width * 4 + i * 4 + 3] = image[j][i].alpha;
}
}
torch::Tensor tensor = torch::from_blob(pointer, {image.get_height(), image.get_width(), 4}, torch::kUInt8).clone(); // copy
tensor = tensor.permute({2, 0, 1}); // {H,W,C} ===> {C,H,W}
delete[] pointer;
return tensor;
}
png::image<png::rgba_pixel> ConvertTensorintoRGBA(torch::Tensor &tensor_){
torch::Tensor tensor = tensor_.permute({1, 2, 0}); // {C,H,W} ===> {H,W,C}
size_t width = tensor.size(1);
size_t height = tensor.size(0);
unsigned char *pointer = tensor.data_ptr<unsigned char>();
png::image<png::rgba_pixel> image(width, height);
for (size_t j = 0; j < height; j++){
for (size_t i = 0; i < width; i++){
image[j][i].red = pointer[j * width * 4 + i * 4 + 0];
image[j][i].green = pointer[j * width * 4 + i * 4 + 1];
image[j][i].blue = pointer[j * width * 4 + i * 4 + 2];
image[j][i].alpha = pointer[j * width * 4 + i * 4 + 3];
}
}
return image;
}
Once again, thank you so much for taking a look. I updated the repo here: https://github.com/QuantScientist/PngTorch/blob/master/src/example002.cpp
After loading an image (RGB or RGBA) I run the tensor through a trained Neural style pytorch model ( see https://github.com/QuantScientist/PngTorch/tree/master/resources for the *.pt files)
However, when saving back the returned tensor as a PNG, 9 small squares appear (e.g 3*3) of the image instead of one single sized image the corresponds to the original image. I thought it has something to do with kUInt8 vs kFlaot32 but this is not the issue.
For instance for this input:
https://github.com/QuantScientist/PngTorch/blob/master/resources/windmill.png
This is the resulting output:
https://github.com/QuantScientist/PngTorch/blob/master/resources/windmill.png-out.png
I am 100% positive that the model itself is working well since I tested it with OpenCV on a video.
Thanks again,
From my old opencv conversion method, i saw I did this:
out_tensor = out_tensor.mul(255).clamp(0, 255).to(torch::kU8);
I added this to the PNG conversion method:
png::image<png::rgb_pixel> VisionUtils::torchToPng(torch::Tensor &tensor_){
torch::Tensor tensor = tensor_.squeeze().detach().cpu().permute({1, 2, 0}); // {C,H,W} ===> {H,W,C}
tensor = tensor.mul(255).clamp(0, 255).to(torch::kU8);
size_t width = tensor.size(1);
size_t height = tensor.size(0);
auto pointer = tensor.data_ptr<unsigned char>();
png::image<png::rgb_pixel> image(width, height);
for (size_t j = 0; j < height; j++){
for (size_t i = 0; i < width; i++){
image[j][i].red = pointer[j * width * 3 + i * 3 + 0];
image[j][i].green = pointer[j * width * 3 + i * 3 + 1];
image[j][i].blue = pointer[j * width * 3 + i * 3 + 2];
}
}
return image;
}
An the result is better although the colors are completely wrong, over-saturated. I think it has to do with the size of k8 vs kfloat and the range between 0-255 and 0 to 1.
Do you know what range the output of this model can originally take? If the output of the model is restricted from 0 to 1 by sigmoid function, I don't think it's wrong.
Looking at recent image generation models, I often feel that the output of the model is restricted from -1 to 1 by tanh function.
out_tensor = out_tensor.mul(255).clamp(0, 255).to(torch::kU8);
It's completely my guess, but what if you change the above code to:
out_tensor = out_tensor.mul(0.5).add(0.5).mul(255).clamp(0, 255).to(torch::kU8);
That is possible, this is the result, almost fully white:
This is the original NN:
import torch
class TransformerNet(torch.nn.Module):
def __init__(self):
super(TransformerNet, self).__init__()
# Initial convolution layers
self.conv1 = ConvLayer(3, 32, kernel_size=9, stride=1)
self.in1 = torch.nn.InstanceNorm2d(32, affine=True)
self.conv2 = ConvLayer(32, 64, kernel_size=3, stride=2)
self.in2 = torch.nn.InstanceNorm2d(64, affine=True)
self.conv3 = ConvLayer(64, 128, kernel_size=3, stride=2)
self.in3 = torch.nn.InstanceNorm2d(128, affine=True)
# Residual layers
self.res1 = ResidualBlock(128)
self.res2 = ResidualBlock(128)
self.res3 = ResidualBlock(128)
self.res4 = ResidualBlock(128)
self.res5 = ResidualBlock(128)
# Upsampling Layers
self.deconv1 = UpsampleConvLayer(128, 64, kernel_size=3, stride=1, upsample=2)
self.in4 = torch.nn.InstanceNorm2d(64, affine=True)
self.deconv2 = UpsampleConvLayer(64, 32, kernel_size=3, stride=1, upsample=2)
self.in5 = torch.nn.InstanceNorm2d(32, affine=True)
self.deconv3 = ConvLayer(32, 3, kernel_size=9, stride=1)
# Non-linearities
self.relu = torch.nn.ReLU()
def forward(self, X):
y = self.relu(self.in1(self.conv1(X)))
y = self.relu(self.in2(self.conv2(y)))
y = self.relu(self.in3(self.conv3(y)))
y = self.res1(y)
y = self.res2(y)
y = self.res3(y)
y = self.res4(y)
y = self.res5(y)
y = self.relu(self.in4(self.deconv1(y)))
y = self.relu(self.in5(self.deconv2(y)))
y = self.deconv3(y)
return y
class ConvLayer(torch.nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride):
super(ConvLayer, self).__init__()
reflection_padding = kernel_size // 2
self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding)
self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride)
def forward(self, x):
out = self.reflection_pad(x)
out = self.conv2d(out)
return out
class ResidualBlock(torch.nn.Module):
"""ResidualBlock
introduced in: https://arxiv.org/abs/1512.03385
recommended architecture: http://torch.ch/blog/2016/02/04/resnets.html
"""
def __init__(self, channels):
super(ResidualBlock, self).__init__()
self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1)
self.in1 = torch.nn.InstanceNorm2d(channels, affine=True)
self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1)
self.in2 = torch.nn.InstanceNorm2d(channels, affine=True)
self.relu = torch.nn.ReLU()
def forward(self, x):
residual = x
out = self.relu(self.in1(self.conv1(x)))
out = self.in2(self.conv2(out))
out = out + residual
return out
class UpsampleConvLayer(torch.nn.Module):
"""UpsampleConvLayer
Upsamples the input and then does a convolution. This method gives better results
compared to ConvTranspose2d.
ref: http://distill.pub/2016/deconv-checkerboard/
"""
def __init__(self, in_channels, out_channels, kernel_size, stride, upsample=None):
super(UpsampleConvLayer, self).__init__()
self.upsample = upsample
reflection_padding = kernel_size // 2
self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding)
self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride)
def forward(self, x):
x_in = x
if self.upsample:
x_in = torch.nn.functional.interpolate(x_in, mode='nearest', scale_factor=self.upsample)
out = self.reflection_pad(x_in)
out = self.conv2d(out)
return out
Which I exported like so:
import torch
import torchvision
import re
import torch
import os
import random
from models.neural import *
style_model = TransformerNet()
state_dict = torch.load("models/weights/neural/candy.pth")
for k in list(state_dict.keys()):
if re.search(r'in\d+\.running_(mean|var)$', k):
del state_dict[k]
style_model.load_state_dict(state_dict)
style_model.eval()
# model = torchvision.models.resnet50(pretrained=True)
# model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(style_model, example)
traced_script_module.save("style_model_cpp.pt")
And the original results are here: https://github.com/gnsmrky/pytorch-fast-neural-style-for-web
I saw the code. There is no trace that the author is using the tanh or sigmoid functions, so I somehow understand why the sample images have artifacts.
And, when I looked at the "save_image" function in the code below, I found that the output image was clamped without range restriction or normalization. Therefore, I think that it is not necessary to change the scale by multiplying 255. https://github.com/gnsmrky/pytorch-fast-neural-style-for-web/blob/master/neural_style/utils.py
Also, I checked the following code again. https://github.com/QuantScientist/PngTorch/blob/master/src/example002.cpp
About the 30th line, I think that casting with "torch::kU8" before clamping the output image may also be the cause of the artifact.
Has this location been changed?
out_tensor = out_tensor.to(torch::kU8).detach().cpu().squeeze(); //Remove batch dim, must convert back to torch::kU8
Thank you so much for looking again. I also opened a thread here with the full code: https://discuss.pytorch.org/t/help-with-a-tracing-neuralstyle-transfer-model-to-c/97120
Removing the cast as you suggested results in:
I can't say that it has been solved yet, but I got interesting results. It found that the negative-positive converted image for the output image of the model is similar to the original result. I hope this result will lead to debugging.
Do you have any idea about this result?
original
before
after
Looks good by "negative-positive converted" you mean by Phtoshop or in c++?
for (size_t j = 0; j < height; j++){
for (size_t i = 0; i < width; i++){
image[j][i].red = 255- pointer[j * width * 3 + i * 3 + 0];
image[j][i].green = 255- pointer[j * width * 3 + i * 3 + 1];
image[j][i].blue = 255 - pointer[j * width * 3 + i * 3 + 2];
}
}
But if you zoom in, you will see very small highly-saturated pixels.
My source code for negative / positive conversion is exactly the same as you.
Now the regular save PNG without inference looks like:
This is without the 255 -, I just wrote, it is a result of my previous changes.
This line:
tensor = tensor.mul(255).clamp(0, 255).to(torch::kU8);
Turns the PNG into negative. Just load a png into tensor and save it, no CNN is involved.
But without tensor = tensor.mul(255).clamp(0, 255).to(torch::kU8); I get nine cubes ....
I have a Windows 10 PC without NVIDIA GPU. Is it possible for me to reproduce the same situation as you?
Of course, In the cmake file https://github.com/QuantScientist/PngTorch/blob/master/cmake_find/download_libtorch.cmake : change: https://github.com/QuantScientist/PngTorch/blob/210ecca72745d816cef5b571d8b95ed96f28ba2c/cmake_find/download_libtorch.cmake#L6
set(CUDA_V "10.2")
set(LIBTORCH_DEVICE "cu102")
To:
set(CUDA_V "cpu")
set(LIBTORCH_DEVICE "cpu")
It will download libtorch automatically for you.
Whenever you see
torch::Device device(torch::kCUDA);
Change it to :
torch::Device device(torch::kCPU);
Thanks!
I think I got it!
Note the us of torch::kFloat32
here:
out_tensor = out_tensor.to(torch::kFloat32).detach().cpu().squeeze(); //Remove batch dim, must convert back to torch::kU8
And here:
png::image<png::rgb_pixel> VisionUtils::torchToPng(torch::Tensor &tensor_){
torch::Tensor tensor = tensor_.squeeze().detach().cpu().permute({1, 2, 0}); // {C,H,W} ===> {H,W,C}
tensor = tensor.clamp(0, 255);
tensor = tensor.to(torch::kU8);
size_t width = tensor.size(1);
size_t height = tensor.size(0);
auto pointer = tensor.data_ptr<unsigned char>();
png::image<png::rgb_pixel> image(width, height);
for (size_t j = 0; j < height; j++){
for (size_t i = 0; i < width; i++){
image[j][i].red = pointer[j * width * 3 + i * 3 + 0];
image[j][i].green = pointer[j * width * 3 + i * 3 + 1];
image[j][i].blue = pointer[j * width * 3 + i * 3 + 2];
}
}
return image;
}
That's great to hear!
I have no clue why this works!
I found the perfect solution when I used Deep Learning programs using OpenCV and LibTorch in Windows 10. We need to use function "contiguous()" just before we convert "torch::Tensor" into another type. It is necessary to sort the data in memory in appropriate order by using "contiguous()".
tensor = tensor.contiguous();
// For OpenCV
cv::Mat imageMat = cv::Mat(cv::Size(tensor.size(1), tensor.size(0)), CV_8UC3, tensor.data_ptr<unsigned char>())
// For PNG++
png::image<png::rgb_pixel> imagePNG(tensor.size(1), tensor.size(0));
auto pointer = tensor.data_ptr<unsigned char>();
for (size_t j = 0; j < height; j++){
for (size_t i = 0; i < width; i++){
imagePNG[j][i].red = pointer[j * width * 3 + i * 3 + 0];
imagePNG[j][i].green = pointer[j * width * 3 + i * 3 + 1];
imagePNG[j][i].blue = pointer[j * width * 3 + i * 3 + 2];
}
}
Thank you!!
Hello, I saw your post on https://qiita.com/koba-jon/items/2b15865f5b4c0c9fbbf7
I am trying to read and write a tensor to and from a PNG using libpng without using OpenCV. See this: https://github.com/prabhuomkar/pytorch-cpp/issues/72
Do you think its possible?
Thanks