Don't use jpeg, use png to avoid lossy...

scruffynerf commented 1 year ago

in detextify/inpainter.py#L129 the temporary file is a jpeg, should be a png to avoid lossy conversion, especially at 512x512...

I was passing in 512x512 pngs, and wondering why I got worse results back.

scruffynerf commented 1 year ago

oh, might depend on the method... realizing I was looking at the replicate code, not the local... but same principle. Playing with adding steps too... 50 might not be enough to get the image back to original quality.

scruffynerf commented 1 year ago

https://github.com/huggingface/diffusers/issues/1368 says it's the strength being too high. Changing that to .3 works well And adding num_inference_steps=100 (or whatever) isn't working, unsure why. If I go change the default in the diffuser python module, that does work (so it's using the default, but not taking a argument it should)

100 steps is better, 200 is even better, but of course, it's 2x or 4x slower... but it's closer to the original image. (so when the text box overlaps half a head, it attempts to put the head back, etc...)

I also am trying "empty flat background, solid color, no text, blank" as a prompt, as I found it was adding lots of 'oh, let me get creative here..." moments.

iuliaturc commented 1 year ago

@scruffynerf Thanks a lot for looking into this!

Indeed, the conversion to .jpeg happens for ReplicateSDInpainter only, so it doesn't explain the discrepancy.
Regarding strength / number of inference steps -- I'm not convinced this would fix it either; edges seem just as visible after 50 steps as they are after 300 (though the in-painted patches themselves look crisper, of course).

This fix makes the edges less jarring though (by in-painting the text boxes only, not the entire tile).

iuliaturc / detextify

Don't use jpeg, use png to avoid lossy... #19