luanfujun / deep-painterly-harmonization

Code and data for paper "Deep Painterly Harmonization": https://arxiv.org/abs/1804.03189
6.07k stars 626 forks source link

Is it possible to modify this model to achieve real-time results. #21

Open CodeW1zard opened 6 years ago

CodeW1zard commented 6 years ago

Marvellous work! The result is real a magic.

However, one may spend several minutes using this model to generate an output. I think the reason is that this model is based on online image optimization. Meanwhile, I find that there are some faster neural style transfer models based on offline optimization.

I am confused that if there is possibly a way to modify this model to be offline. Maybe there is no painterly harmonization projects like that, but I just need some advice.

Thanks!

luanfujun commented 6 years ago

Thanks for your interest! Style transfer originally is based on an offiline iterative optimization framework to progressively minimize some loss functions and generate the stylized output. Recent work on fast style transfer uses one single feedforward pass to achieve real-time performance.

During this project, we also experimented with fast style transfer networks but the style is either a bit too smooth or texture is mismatched (since there is no PatchMatch during the feedforwad pass in their pipeline). But this is of course a good future research direction, that combines fast semantic neural patches correspondence search along with fast stylization neural networks for online speed.

Cheers, Fujun

eridgd commented 6 years ago

Hi @luanfujun, great work and thanks for publishing the code.

I think one example of combining fast patch matching and feedforward style networks is https://github.com/Yijunmaverick/UniversalStyleTransfer, where they use an approach from https://github.com/rtqichen/style-swap that formulates the patch matching as:

1) Loading style patches as filters and 2D convolving with the content feature patches to produce cross-correlation scores, 2) Channel-wise argmax to determine closest style patch at each spatial position, and 3) Transposed conv to reconstruct the content feature using style patches. A separate transposed conv is used to count the # of overlapping patches and average.

The Universal Style Transfer paper doesn't mention this, but their code includes an option to mix the feature statistics matching of Whiten-Color Transform along with style swap at the conv5_1 layer of VGG19 before decoding. I replicated this style swap option in my own TF implementation of WCT - https://github.com/eridgd/WCT-TF/blob/38fedaa49e1da6885896748c827d227887045017/ops.py#L220-L278

I'm fairly new to these patch-based methods and was wondering - how does this 'style swap' that uses convs compare to other patch approaches? Are then any related approaches that I should be aware of?

Thanks!

CodeW1zard commented 6 years ago

@luanfujun Thanks for your reply!