jantic / DeOldify

A Deep Learning based project for colorizing and restoring old images (and video!)
MIT License
17.94k stars 2.55k forks source link

Worked! Observations from a newbie. (Win 10 install) #55

Closed stevemurch closed 5 years ago

stevemurch commented 5 years ago

Such an interesting, outstanding project. Kudos. Also appreciate the obvious care you and the community are putting into documentation of this, which is no easy task.

Two main comments:

First, for Windows users trying it out locally, I noticed an out-of-memory error on the "Color Visualization" notebook, where memory doesn't seem to be automatically released. I was able to resolve it with explicit memory cleanup before each visualization. Please see this thread for a workaround: https://github.com/jantic/DeOldify/issues/49

Second -- a couple general observations:

1) This is amazing work. Really fun to see photos come to life.

2) In my test trials, medium head shots (i.e., waist up) seem to do much better overall than, say, full shots set on a larger landscape. And my own anecdotal tests are right in line with your observation about a blue clothing bias -- it seems to want to bias toward blue for many articles of clothing.

This got me wondering: given the relative higher accuracy of medium head shots (if my anecdotal observation is actually really true) I was wondering if one optimization for the generator during training might be to "heavily weight flesh tone of a generated medium shot" -- i.e., try a face-detect first, get the largest face in the picture, try to build a "medium shot" of that by cropping then bias heavily toward those weights? I don't know at all if or how this would map to your existing code, just thought I'd throw it out there if it sparks any ideas.

jantic commented 5 years ago

Great input. So on your suggestions there- I do think there's potential for improvement in having more clever usage of the model. Unfortunately (yet fortunately!), the model learns all this stuff for me and is pretty much a black box- as opposed to being a deterministic set of lines of code. So there's a few things that follow as a result:

  1. It has to be a matter of generalized training/model tweaks to get these things improved.
  2. The model deals with whole images. Tweaks involving segmentation would be pretty much impossible because I don't have access to the fine details that would ensure that the whole image that emerges wouldn't look segmented as a result (and I can't say "if face...do this...else..do that").

That being said...I think what you're talking about here could be addressed by better training and better models. I'm already seeing improvements; with a better model that affect these things.

stevemurch commented 5 years ago

Agree. I'm going to close this "issue" since it's really not a known work item.