Unable to upscale images with Unicode characters anywhere in the file name

RunDevelopment commented 2 years ago

CupScale cannot upscale any images with non-ASCII characters in its name. I first found this bug on a file with a "ｂ" (U+ff42) but it happens for all non-ASCII characters (e.g. German umlauts ä, ö, ü).

Judging from the log, it seems like the Python part can't handle non-ASCII characters. Fixing this could be as easy as making a temporary copy of the input image and changing the name to only include ASCII characters.

Log:

[MainUI] Dropped 1 file(s), first = C:\Users\micha\Desktop\m30_00_ｂorder_01_n.png
[ImgProc] Converting C:\Users\micha\Desktop\m30_00_ｂorder_01_n.png to PngRaw, DelSrc: False, Fill: True, Ext: UseNew
[ImgProc] Color depth of m30_00_ｂorder_01_n.png is 24.
[ImgProc] Written image to C:\DS3TexUp\Cupscale 1.39.0f1\CupscaleData\loaded-img\temp.png
[ImgProc] Preprocessing C:\DS3TexUp\Cupscale 1.39.0f1\CupscaleData\img-in\m30_00_ｂorder_01_n.png - Fill Alpha: True
[ImgProc] Color depth of m30_00_ｂorder_01_n.png is 24.
[CMD] /C cd /D "C:\DS3TexUp\Cupscale 1.39.0f1\CupscaleData\bin\esrgan-pytorch" & python upscale.py --input "C:\DS3TexUp\Cupscale 1.39.0f1\CupscaleData\img-in" --output "C:\DS3TexUp\Cupscale 1.39.0f1\CupscaleData\img-out"   --device_id 0 --fp16  --alpha_mode 0   "C:\DS3TexUp\Cupscale 1.39.0f1\CupscaleData\models\ESRGAN\4x-UltraSharp.pth"
[Python] Model: 4x-UltraSharp
[Python] Traceback (most recent call last):
[Python] Upscaling...
[Python]   File "upscale.py", line 340, in <module>
[Python]     print(idx, base)
[Python]   File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
[Python]     return codecs.charmap_encode(input,self.errors,encoding_table)[0]
[Python] UnicodeEncodeError: 'charmap' codec can't encode character '\uff42' in position 7: character maps to <undefined>
Error: Can't find upscaled output image! This probably means the AI implementation failed to run correctly.

Index was outside the array bounds.

Stack trace:
   at Cupscale.Main.Upscale.GetOutputImg()
FilenamePostprocess: Moving  => -4x-UltraSharp
Error during FilenamePostprocess(): 

The file name cannot be NULL.
Parameter name: sourceFileName

Stack trace:
   at System.IO.File.InternalMove(String sourceFileName, String destFileName, Boolean checkHost)
   ar Cupscale.Main.Upscale.FilenamePostprocess(String file)
Overwrite is off - keeping suffix.
[IOUtils] Copying directory "C:\DS3TexUp\Cupscale 1.39.0f1\CupscaleData\img-out" to "C:\Users\micha\Desktop" (Move: False - RemoveFromName: )

(Parts of the log were in German, so I translated it.)

Phenrei commented 2 years ago

After an annoyingly long dig, I found a solution you can hack into upscale.py via https://github.com/azzhu/opencv-utf-8/blob/master/cv2_ext/__init__.py

cv2 doesn't like the utf8 and, annoyingly, that "patch" to the function requires passing in raw strings which seems to be a pain coming from a predefined variable. Luckily, its core function change to imread can be duplicated:

def imread(p):
    data = np.fromfile(p, dtype=np.uint8)
    return cv2.imdecode(data, cv2.IMREAD_UNCHANGED)

So replace img = cv2.imread(rawpath, cv2.IMREAD_UNCHANGED) on line 346 with

imageData = np.fromfile(path, dtype=np.uint8)
img = cv2.imdecode(imageData, cv2.IMREAD_UNCHANGED)

and problem solved.

However, after a little more testing this only seems to work if cmd windows are on in debugging. Without debugging on it stops immediately and of course without debugging I'm not sure why.

Phenrei commented 2 years ago

Got it without (as much) hackery. There's a couple places where unicode blows up the parse block. You have to .encode("utf-8") both line 340's use of base and line 395's tmpPath as well as the cv2 issue, or the script blows up before that point without debug console output.

I also imported unicodedata and changed my cv2 parse block to this, as I feel like loading the image into memory then asking cv2 to parse it is slower, so it will now only do so if there's a "bad" character in the filename.

    if unicodedata.normalize('NFKD', path).encode('ascii', 'ignore') == path:
        img = cv2.imread(path, cv2.IMREAD_UNCHANGED)
    else:
        imageData = np.fromfile(path, dtype=np.uint8)
        img = cv2.imdecode(imageData, cv2.IMREAD_UNCHANGED)

Phenrei commented 2 years ago

After checking because stuff felt like it was still taking forever, turns out I have to re-decode the result of that new encode to have it be equal. So it would be something like this (optional print to check).

    if unicodedata.normalize('NFKD', path).encode('ascii', 'ignore').decode() == path:
        print("----- using imread -----")
        img = cv2.imread(path, cv2.IMREAD_UNCHANGED)
    else:
        print("----- using np fromfile -----")
        imageData = np.fromfile(path, dtype=np.uint8)
        img = cv2.imdecode(imageData, cv2.IMREAD_UNCHANGED)

n00mkrad / cupscale

Unable to upscale images with Unicode characters anywhere in the file name #102