Greatly increase max resolution output by taking advantage of this chrominance optimization

jantic commented 5 years ago

Source: MayeulC on HackerNews, thread:

https://news.ycombinator.com/item?id=18363870#18369410

"Now, there seems to be a distinct loss of details in the restored images. The network being resolution-limited, is the black-and-white image displayed at full resolution besides the restored one?

What I would like to see is the output of the network to be treated as chrominance only.

Take the YUV transform of both the input and output images, scale back the UV matrix of the restored one to match the input, and replace the original channels. I'd be really curious to look at the output (and would do it myself if I was not on a smartphone)!"

jantic commented 5 years ago

And....This is done! So happy about this one.

MayeulC commented 5 years ago

Hey, I just read your answer on HN. I'm glad to have been helpful, and that you were able to take advantage of this. Also, thank you for making this an issue, it makes further discussion easier.

For future reference, here is some material that was part of the original comment thread:

I had a look at https://github.com/jantic/DeOldify/commit/dabb3a00edb7300a0f71cf97df6c2a9bda184799 but couldn't determine if you reduced the dimensionality of the input/output data? Unfortunately, I am not familiar enough with the code to tell this or contribute in a meaningful way.

I touched on this idea here and there, but the basic idea is that you should be able to reduce the size of the input data a lot (factor 3) by feeding your network the luma channel instead of RGB. And have it only output the chroma channels. You can probably leave most hyperparameters untouched, although you might be able to reduce its size further (I am by no mean an authority on this, so take this with a grain of salt). This could provide quite sizeable performance improvements; mostly for training, but also at runtime.

jantic commented 5 years ago

@MayeulC So yes I did consider reducing the dimensionality of the input like you suggested here. But here's the thing- as far as I can tell it wouldn't actually make a huge inpact on model size/efficiency. Reason being: The gray scale input currently comes in 3 channels in an input layer, but then is immediately expanded into much higher dimensional data as the model processes it. 3 to 64 to 256 to 512, etc. At that point I'd expect that the model would be effectively consolidating the redundancies in the channels (when I reduce the dimensions I get reduced performance). So really, as soon as they're processed, the fact that they were 3 channels or one quickly becomes almost irrelevant- the information that's relevant, regardless of redundancy is already extracted.

So in other words- I'd expect it to make almost no difference to reduce the input channels here. Now to complicate matters, I'm also using a pretrained network that expects 3 channels already. The pretrained network (Resnet34) has these hard-earned weights (coefficients) that took a lot of time to train on somebody else's machine. So I think that's worth keeping as well- there's not going to be a 1 channel version of this pretrained.

I might be wrong somehow here. Please correct me if that's the case!

MayeulC commented 5 years ago

All right, your explanation makes sense. I was expecting the dimensionality gains to propagate down the layers, but it is true that it might only provide tangible gains in the first (and last) channels. And the fact that you are using a pre-trained network also makes sense!

I am afraid I can't really provide much more interesting input for your project, I wish you the best of luck with it, and I look forward to its next iterations!

jantic commented 5 years ago

@MayeulC Dude, you made such a huge impact on this project already! That was the single most impactful improvement I've been able to make on the rendering. So thank you thank you thank you.

jantic / DeOldify

Greatly increase max resolution output by taking advantage of this chrominance optimization #17