Closed dq0309 closed 1 year ago
@dq0309 Hi thanks for your interest in our work! You are correct about the fixed resolution issue. Our model and many ViT-based models for computer vision tasks share the same problem.
For high-resolution images (e.g., 1280x720), I suggest that you first resize the image to a 244x244 image and use it as an input (which is what the code already does). Then, resize output of the model (ab color values) back into the high resolution image (1280x720) and concatenate with the original grayscale image (which is also in 1280x720 for this example). By doing so, the details from the grayscale image will be preserved and only the color values will be interpolated.
We emperically find that the grayscale image contains most of the details in the image, which allows us to boldly interpolate color values.
IF You guys did that in this code ( scale output colours to original big image size, that would be really neat, its not easy to just "concatenate".
@2blackbar Well all it takes is changing lines 171-173 in infer.py. Which literally uses the torch.cat() function. Any PRs are welcome.
Thanks for your work! In the training, validation, and inferring stages, it seems the images are firstly resized to (224, 224) and then the PSNR is calculated. I want to conduct colorization on multi-resolution images, but the resize operation may degrade the PSNR value on the original resolution, which is bigger than (224, 224). So could you provide some suggestions to modify your code for addressing this problem? Thank you!