uvipen / ASCII-generator

ASCII generator (image to text, image to image, video to video)
MIT License
1.52k stars 233 forks source link

Incorrect aspect ratio in output of `img2img.py` (and similars: `img2img_color.py`, `video2video.py` and `video2video_color.py`) #21

Open quark67 opened 7 months ago

quark67 commented 7 months ago

From the demo image (dimension: 976×538, aspect ratio = 976/538 = 1.81):

python3 img2img.py --num_cols 100 --language general --mode complex --background white --output data/output.png gives:

output

which is a picture of dimension 1200×515 (aspect ratio = 1200/515 = 2.33, which is very different from 1.81).

The reason is this line of code: cell_height = scale * cell_width (in line 36 of img2img.py).

The factor cell_height / cell_width needs to be the same as the factor char_height / char_width, so the previous code becomes:

cell_height = (char_height / char_width) * cell_width.

Moreover, the line char_width, char_height = font.getsize(sample_character) generate a warning: DeprecationWarning: getsize is deprecated and will be removed in Pillow 10 (2023-07-01). Use getbbox or getlength instead.

So on utils.py, all lines similar to char_width, char_height = font.getsize("◊") (with various values for ◊) needs to be replaced by:

char_bbox = font.getbbox("◊")
char_width = char_bbox[2] - char_bbox[0]
char_height = char_bbox[3]

(caution: there is no missing char_bbox[1] in the previous code. And strangely "bottom" really gives the height. See this: https://github.com/python-pillow/Pillow/issues/7802).

This correction must also be made in line 44 of img2img.py: remplace char_width, char_height = font.getsize(sample_character) with

char_bbox = font.getbbox(sample_character)
char_width = char_bbox[2] - char_bbox[0]
char_height = char_bbox[3]

So, by rearranging order of some calculus (because the calculus of cell_height needs to known the value of char_height / char_width), I suggest this correction in the code of img2img.py (extract of the code for the main function):

def main(opt):
    if opt.background == "white":
        bg_code = 255
    else:
        bg_code = 0
    char_list, font, sample_character, scale = get_data(opt.language, opt.mode)
    num_chars = len(char_list)
    num_cols = opt.num_cols
    image = cv2.imread(opt.input)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    height, width = image.shape
    char_bbox = font.getbbox(sample_character)
    char_width = char_bbox[2] - char_bbox[0]
    char_height = char_bbox[3]
    cell_width = width / opt.num_cols
    #cell_height = scale * cell_width
    cell_height = (char_height/char_width) * cell_width
    num_rows = int(height / cell_height)
    if num_cols > width or num_rows > height:
        print("Too many columns or rows. Use default setting")
        cell_width = 6
        #cell_height = 12
        cell_height = (char_height/char_width) * cell_width
        num_cols = int(width / cell_width)
        num_rows = int(height / cell_height)
    #char_width, char_height = font.getsize(sample_character)
    out_width = char_width * num_cols
    out_height = scale * char_height * num_rows
    out_image = Image.new("L", (out_width, out_height), bg_code)
    draw = ImageDraw.Draw(out_image)

For comparison, the old code for the same portion was:

def main(opt):
    if opt.background == "white":
        bg_code = 255
    else:
        bg_code = 0
    char_list, font, sample_character, scale = get_data(opt.language, opt.mode)
    num_chars = len(char_list)
    num_cols = opt.num_cols
    image = cv2.imread(opt.input)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    height, width = image.shape
    cell_width = width / opt.num_cols
    cell_height = scale * cell_width
    num_rows = int(height / cell_height)
    if num_cols > width or num_rows > height:
        print("Too many columns or rows. Use default setting")
        cell_width = 6
        cell_height = 12
        num_cols = int(width / cell_width)
        num_rows = int(height / cell_height)
    char_width, char_height = font.getsize(sample_character)
    out_width = char_width * num_cols
    out_height = scale * char_height * num_rows
    out_image = Image.new("L", (out_width, out_height), bg_code)
    draw = ImageDraw.Draw(out_image)

So, with the modified code, python3 img2img.py --num_cols 100 --language general --mode complex --background white --output data/NewOutput.png gives:

NewOutput

The dimension of this corrected image is 1200×648, and it's aspect ratio is 1200/648=1.85, which is near the 1.81 aspect ratio of the original image.

I will show a more visible difference, by scaling the outputted image, so it's width is the same as the inputed image, and displaying it in a graphic manipulation software, with a transparent backgroung over the inputed image:

Before the correction:

image

After the correction:

image