i30817 / libretrofuzz

Fuzzy Retroarch thumbnail downloader
MIT License
15 stars 2 forks source link

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 #5

Closed Feathered-Serpent closed 11 months ago

Feathered-Serpent commented 11 months ago

Hey there,

found that little program and wanted to give it a try. Though after testing some, it doesn't seem to be able to handle UTF-16 characters in a playlist. After trying it a few times, I finally found a message, that helped me finding the problem:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 12606344: character maps to <undefined>
Exception ignored in: <module 'threading' from 'C:\\Users\\Mini\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\threading.py'>
Traceback (most recent call last):

The file is named "Chojn╟ⁿw '94 Compo Music Pack #1 (1994)(Altus).d64" I don't know if the symbols are an error or not. Maybe the program should catch the UnicodeDecodeError and skip that one file instead of stopping.

After I renamed the files (just replacing ⁿ with a _; where a few dozen) I ran the program again, only to find another file name with a character it doesn't like. This time the filename was "PETSCÏÏhead (2020-01-13)(Seven).prg" I renamed the Ï to I and seemingly after that it started downloading thumbnails.

Windows 11 Pro; Python 3.12.0 64bit.

i30817 commented 11 months ago

Thanks for notifying me. I don't have a computer at this time (this is a android tablet), so a fix will take a while, possibly too long since I'm broke. You can avoid the problem with renaming the affected files and rescanning the playlist, if it was made with the manual scanner... the reason is that by changing the filename, you disconnected the playlist entry from the actual file, so it won't start if you try to load it from RetroArch because it can't find the file. You can also only change the playlist game label, but that breaks whenever you rescan instead.

It should be unusual, most games .dats will use utf-8 I think, and so most game files. Since libretro doesn't have thumbnails for demos for those two you probably won't lose anything. The default server the program uses for thumbnails are the Png files at https://thumbnails.libretro.com although you can change that in the options.... as long as it uses the same subdirectory name structure and png thumbnails.

I actually find it strange libretro is even putting utf 16 on playlists, probably something they do to be able to reference a windows file with utf16 filename. They probably should only allow it in the filename not the label (replace the characters with no equivalent by the utf8 missing character symbol) since it will display wrong for them too, most likely. The program only uses the label to find the pngs, because it's more flexible (some ways to build playlists with the manual scanner or other utilities don't have names taken from the filename to have better names, for instance using mame .dat files).

i30817 commented 11 months ago

Note to self, this is where the playlist entries are read: https://github.com/i30817/libretrofuzz/blob/ee62ebc23b6eacf21b60864f87afec4039e3d7c9/libretrofuzz/__main__.py#L514

Feathered-Serpent commented 11 months ago

I did like you said before opening the bug :) for such things it is somehow better to keep the playlists uncompressed, as you can then change the entry without having to scan again. So playlist entry and file name are changed by now.

i30817 commented 11 months ago

I can't run the program so I need some help to see if this fixes it or makes it worse.

First update the program with pip, like the readme says.

Then restore the bugged items in the playlists, and check if it can start processing that playlist with the --no-meta option. If it starts it means it read all of the playlist and it's good.

As for the fix, it's kind of iffy if it's going to be good.

Fact 1. Filenames in Linux are utf8.

Fact 2. Filenames in windows are one of the variants of utf16

Fact 3. Json is supposed to require utf-8 for encoders and decoders.

I don't know what RetroArch does here. The sane thing would be to convert filenames (and labels if the label comes from the filename as it sometimes can) to uft-8 from the native encoding. However we already saw bytes unrepresentable. But previously I didn't specify the encoding so it might very well be that it was being treated as locale dependent and was using iso something instead.

The INSANE thing so would be treating the filename as a opaque sequence of bytes, passed directly to whatever apis open the files as such. Why is it insane? Json is a human readable and writable format and people like to edit and requires utf8. RetroArch itself also joins path strings if portable playlists are enabled, and those can even be platform independent.

So I used utf8 in the hope that windows filenames are turned into utf8 and the error that happened was because it was converting to the default charset on windows when opened iso something something and that was the error (in spite of the message saying nothing about a specific target codec).

The errors='ignore' argument is a last ditch effort for if there is actually a crazy situation where strings of mixed encoding leak into the text. I used ignore to ommit the weird characters, but I'm not sure if I should have used replace instead (either � or □ depending on type of error, it's not very clear.)

i30817 commented 11 months ago

I think I'll make a new release with replace instead, since it keeps the names the same size and might get better results for similarity checks.

Edit: done

Edit2: although I'm 99% sure that if this happens to a label string, even if it matches and downloads a server thumbnail, it won't appear in RA because it won't understand the replacement character. It just prevents a exception and is a clue about the fucked label entry if the user checks the thumbs.

Feathered-Serpent commented 11 months ago

I just put the names into the playlist again, without the files existing. As fuzz isn't checking if the files in the playlist even exist, that should be sufficient. Running the test against latest version now.

Feathered-Serpent commented 11 months ago

Now I get another error:

Commodore - 64.lpl -> Commodore - 64
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\site-packages\libretrofuzz\__main__.py │
│ :971 in mainfuzzall                                                                              │
│                                                                                                  │
│    968 │   │   │   error("Cancelled by user, exiting")                                           │
│    969 │   │   │   raise Exit()                                                                  │
│    970 │                                                                                         │
│ ❱  971 │   asyncio.run(runit(), debug=False)                                                     │
│    972                                                                                           │
│    973                                                                                           │
│    974 async def downloadgamenames(client, system, nub_verbose):                                 │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │              _ = WindowsPath('C:/Program Files/RetroArch/playlists')                         │ │
│ │        address = 'http://localhost:8000'                                                     │ │
│ │         before = None                                                                        │ │
│ │            cfg = WindowsPath('C:/Program Files/RetroArch/retroarch.cfg')                     │ │
│ │         dryrun = False                                                                       │ │
│ │        filters = []                                                                          │ │
│ │           hack = False                                                                       │ │
│ │      inSystems = [                                                                           │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Amstrad -         │ │
│ │                  CPC.lpl'),                                                                  │ │
│ │                  │   │   'Amstrad - CPC'                                                     │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Amstrad -         │ │
│ │                  GX4000.lpl'),                                                               │ │
│ │                  │   │   'Amstrad - GX4000'                                                  │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  2600.lpl'),                                                                 │ │
│ │                  │   │   'Atari - 2600'                                                      │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  5200.lpl'),                                                                 │ │
│ │                  │   │   'Atari - 5200'                                                      │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  7800.lpl'),                                                                 │ │
│ │                  │   │   'Atari - 7800'                                                      │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  8-bit.lpl'),                                                                │ │
│ │                  │   │   'Atari - 8-bit'                                                     │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  Jaguar.lpl'),                                                               │ │
│ │                  │   │   'Atari - Jaguar'                                                    │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  Lynx.lpl'),                                                                 │ │
│ │                  │   │   'Atari - Lynx'                                                      │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - ST.lpl'), │ │
│ │                  │   │   'Atari - ST'                                                        │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Bandai -          │ │
│ │                  WonderSwan Color.lpl'),                                                     │ │
│ │                  │   │   'Bandai - WonderSwan Color'                                         │ │
│ │                  │   ),                                                                      │ │
│ │                  │   ... +59                                                                 │ │
│ │                  ]                                                                           │ │
│ │          limit = 1                                                                           │ │
│ │         nofail = False                                                                       │ │
│ │        noimage = False                                                                       │ │
│ │        nomerge = False                                                                       │ │
│ │         nometa = False                                                                       │ │
│ │   notInSystems = [                                                                           │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Commodore -       │ │
│ │                  CDTV.lpl'),                                                                 │ │
│ │                  │   │   'Commodore - CDTV'                                                  │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Commodore -       │ │
│ │                  PET.lpl'),                                                                  │ │
│ │                  │   │   'Commodore - PET'                                                   │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Infocom -         │ │
│ │                  Z-Machine.lpl'),                                                            │ │
│ │                  │   │   'Infocom - Z-Machine'                                               │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Nintendo -        │ │
│ │                  e-Reader.lpl'),                                                             │ │
│ │                  │   │   'Nintendo - e-Reader'                                               │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Sharp - X1.lpl'), │ │
│ │                  │   │   'Sharp - X1'                                                        │ │
│ │                  │   )                                                                       │ │
│ │                  ]                                                                           │ │
│ │    nub_verbose = False                                                                       │ │
│ │       playlist = WindowsPath('C:/Program Files/RetroArch/playlists/Sharp - X1.lpl')          │ │
│ │      playlists = [                                                                           │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Amstrad - CPC.lpl'),  │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Amstrad -             │ │
│ │                  GX4000.lpl'),                                                               │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - 2600.lpl'),   │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - 5200.lpl'),   │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - 7800.lpl'),   │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - 8-bit.lpl'),  │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - Jaguar.lpl'), │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - Lynx.lpl'),   │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - ST.lpl'),     │ │
│ │                  │   WindowsPath('C:/Program Files/RetroArch/playlists/Bandai - WonderSwan   │ │
│ │                  Color.lpl'),                                                                │ │
│ │                  │   ... +59                                                                 │ │
│ │                  ]                                                                           │ │
│ │          runit = <function mainfuzzall.<locals>.runit at 0x0000024239BCD260>                 │ │
│ │          score = 100                                                                         │ │
│ │         system = 'Sharp - X1'                                                                │ │
│ │        systems = [                                                                           │ │
│ │                  │   '.git',                                                                 │ │
│ │                  │   'Amstrad - CPC',                                                        │ │
│ │                  │   'Amstrad - GX4000',                                                     │ │
│ │                  │   'Atari - 2600',                                                         │ │
│ │                  │   'Atari - 5200',                                                         │ │
│ │                  │   'Atari - 7800',                                                         │ │
│ │                  │   'Atari - 8-bit',                                                        │ │
│ │                  │   'Atari - Jaguar',                                                       │ │
│ │                  │   'Atari - Lynx',                                                         │ │
│ │                  │   'Atari - ST',                                                           │ │
│ │                  │   ... +107                                                                │ │
│ │                  ]                                                                           │ │
│ │ thumbnails_dir = WindowsPath('D:/thumbnails')                                                │ │
│ │        verbose = False                                                                       │ │
│ │     wait_after = None                                                                        │ │
│ │    wait_before = None                                                                        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py:194 in run          │
│                                                                                                  │
│   191 │   │   │   "asyncio.run() cannot be called from a running event loop")                    │
│   192 │                                                                                          │
│   193 │   with Runner(debug=debug, loop_factory=loop_factory) as runner:                         │
│ ❱ 194 │   │   return runner.run(main)                                                            │
│   195                                                                                            │
│   196                                                                                            │
│   197 def _cancel_all_tasks(loop):                                                               │
│                                                                                                  │
│ ╭────────────────────────────────────── locals ──────────────────────────────────────╮           │
│ │        debug = False                                                               │           │
│ │ loop_factory = None                                                                │           │
│ │         main = <coroutine object mainfuzzall.<locals>.runit at 0x0000024238DCD5A0> │           │
│ │       runner = <asyncio.runners.Runner object at 0x0000024239647830>               │           │
│ ╰────────────────────────────────────────────────────────────────────────────────────╯           │
│                                                                                                  │
│ C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py:118 in run          │
│                                                                                                  │
│   115 │   │                                                                                      │
│   116 │   │   self._interrupt_count = 0                                                          │
│   117 │   │   try:                                                                               │
│ ❱ 118 │   │   │   return self._loop.run_until_complete(task)                                     │
│   119 │   │   except exceptions.CancelledError:                                                  │
│   120 │   │   │   if self._interrupt_count > 0:                                                  │
│   121 │   │   │   │   uncancel = getattr(task, "uncancel", None)                                 │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │        context = <_contextvars.Context object at 0x0000024239BF4B80>                         │ │
│ │           coro = <coroutine object mainfuzzall.<locals>.runit at 0x0000024238DCD5A0>         │ │
│ │           self = <asyncio.runners.Runner object at 0x0000024239647830>                       │ │
│ │ sigint_handler = functools.partial(<bound method Runner._on_sigint of                        │ │
│ │                  <asyncio.runners.Runner object at 0x0000024239647830>>, main_task=<Task     │ │
│ │                  finished name='Task-1' coro=<mainfuzzall.<locals>.runit() done, defined at  │ │
│ │                  C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\site-packages\li… │ │
│ │                  exception=OSError(22, 'Die Syntax für den Dateinamen, Verzeichnisnamen oder │ │
│ │                  die Datenträgerbezeichnung ist falsch')>)                                   │ │
│ │           task = <Task finished name='Task-1' coro=<mainfuzzall.<locals>.runit() done,       │ │
│ │                  defined at                                                                  │ │
│ │                  C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\site-packages\li… │ │
│ │                  exception=OSError(22, 'Die Syntax für den Dateinamen, Verzeichnisnamen oder │ │
│ │                  die Datenträgerbezeichnung ist falsch')>                                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py:664 in          │
│ run_until_complete                                                                               │
│                                                                                                  │
│    661 │   │   if not future.done():                                                             │
│    662 │   │   │   raise RuntimeError('Event loop stopped before Future completed.')             │
│    663 │   │                                                                                     │
│ ❱  664 │   │   return future.result()                                                            │
│    665 │                                                                                         │
│    666 │   def stop(self):                                                                       │
│    667 │   │   """Stop running the event loop.                                                   │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │   future = <Task finished name='Task-1' coro=<mainfuzzall.<locals>.runit() done, defined at  │ │
│ │            C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\site-packages\libretro… │ │
│ │            exception=OSError(22, 'Die Syntax für den Dateinamen, Verzeichnisnamen oder die   │ │
│ │            Datenträgerbezeichnung ist falsch')>                                              │ │
│ │ new_task = False                                                                             │ │
│ │     self = <ProactorEventLoop running=False closed=True debug=False>                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\site-packages\libretrofuzz\__main__.py │
│ :941 in runit                                                                                    │
│                                                                                                  │
│    938 │   │   │   │   │   │   )                                                                 │
│    939 │   │   │   │   │   for playlist, system in inSystems:                                    │
│    940 │   │   │   │   │   │   echo(style(f"{system}.lpl -> {system}", bold=True))               │
│ ❱  941 │   │   │   │   │   │   names, dbs = readPlaylistAndPrepareDirectories(playlist, tmpdir,  │
│    942 │   │   │   │   │   │   try:                                                              │
│    943 │   │   │   │   │   │   │   await downloader(                                             │
│    944 │   │   │   │   │   │   │   │   names,                                                    │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │         before = None                                                                        │ │
│ │         client = <httpx.AsyncClient object at 0x0000024239D303E0>                            │ │
│ │            dbs = [                                                                           │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   'Coleco - ColecoVision',                                                │ │
│ │                  │   ... +381                                                                │ │
│ │                  ]                                                                           │ │
│ │         dryrun = False                                                                       │ │
│ │        filters = []                                                                          │ │
│ │           hack = False                                                                       │ │
│ │      inSystems = [                                                                           │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Amstrad -         │ │
│ │                  CPC.lpl'),                                                                  │ │
│ │                  │   │   'Amstrad - CPC'                                                     │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Amstrad -         │ │
│ │                  GX4000.lpl'),                                                               │ │
│ │                  │   │   'Amstrad - GX4000'                                                  │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  2600.lpl'),                                                                 │ │
│ │                  │   │   'Atari - 2600'                                                      │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  5200.lpl'),                                                                 │ │
│ │                  │   │   'Atari - 5200'                                                      │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  7800.lpl'),                                                                 │ │
│ │                  │   │   'Atari - 7800'                                                      │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  8-bit.lpl'),                                                                │ │
│ │                  │   │   'Atari - 8-bit'                                                     │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  Jaguar.lpl'),                                                               │ │
│ │                  │   │   'Atari - Jaguar'                                                    │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari -           │ │
│ │                  Lynx.lpl'),                                                                 │ │
│ │                  │   │   'Atari - Lynx'                                                      │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Atari - ST.lpl'), │ │
│ │                  │   │   'Atari - ST'                                                        │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Bandai -          │ │
│ │                  WonderSwan Color.lpl'),                                                     │ │
│ │                  │   │   'Bandai - WonderSwan Color'                                         │ │
│ │                  │   ),                                                                      │ │
│ │                  │   ... +59                                                                 │ │
│ │                  ]                                                                           │ │
│ │          limit = 1                                                                           │ │
│ │          names = [                                                                           │ │
│ │                  │   '2010 - The Graphic Action Game (USA)',                                 │ │
│ │                  │   '2010 - The Graphic Action Game (USA) (Beta)',                          │ │
│ │                  │   '6 Noises Demo (2006)(-)(PD)',                                          │ │
│ │                  │   'A.E. (USA) (Proto)',                                                   │ │
│ │                  │   'Activision Decathlon, The (USA)',                                      │ │
│ │                  │   "Adam's Musicbox Demo (USA) (Demo)",                                    │ │
│ │                  │   'AdamCon 17 Demo (2005)(Bienvenu, Daniel)(PD)',                         │ │
│ │                  │   'Adventurium 3 Demo (2000)(-)(PD)',                                     │ │
│ │                  │   'Air Battle v0.4 (2000)(Bienvenu, Daniel)(PD)',                         │ │
│ │                  │   'Alcazar - The Forgotten Fortress (USA)',                               │ │
│ │                  │   ... +381                                                                │ │
│ │                  ]                                                                           │ │
│ │         nofail = False                                                                       │ │
│ │        noimage = False                                                                       │ │
│ │        nomerge = False                                                                       │ │
│ │         nometa = False                                                                       │ │
│ │   notInSystems = [                                                                           │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Commodore -       │ │
│ │                  CDTV.lpl'),                                                                 │ │
│ │                  │   │   'Commodore - CDTV'                                                  │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Commodore -       │ │
│ │                  PET.lpl'),                                                                  │ │
│ │                  │   │   'Commodore - PET'                                                   │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Infocom -         │ │
│ │                  Z-Machine.lpl'),                                                            │ │
│ │                  │   │   'Infocom - Z-Machine'                                               │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Nintendo -        │ │
│ │                  e-Reader.lpl'),                                                             │ │
│ │                  │   │   'Nintendo - e-Reader'                                               │ │
│ │                  │   ),                                                                      │ │
│ │                  │   (                                                                       │ │
│ │                  │   │   WindowsPath('C:/Program Files/RetroArch/playlists/Sharp - X1.lpl'), │ │
│ │                  │   │   'Sharp - X1'                                                        │ │
│ │                  │   )                                                                       │ │
│ │                  ]                                                                           │ │
│ │    nub_verbose = False                                                                       │ │
│ │       playlist = WindowsPath('C:/Program Files/RetroArch/playlists/Commodore - 64.lpl')      │ │
│ │          score = 100                                                                         │ │
│ │         system = 'Commodore - 64'                                                            │ │
│ │ thumbnails_dir = WindowsPath('D:/thumbnails')                                                │ │
│ │         tmpdir = 'D:\\thumbnails\\libretrofuzzjreytl5l'                                      │ │
│ │        verbose = False                                                                       │ │
│ │     wait_after = None                                                                        │ │
│ │    wait_before = None                                                                        │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\Users\Mini\AppData\Local\Programs\Python\Python312\Lib\site-packages\libretrofuzz\__main__.py │
│ :548 in readPlaylistAndPrepareDirectories                                                        │
│                                                                                                  │
│    545 │   for parent in [temp_dir, thumbnails_dir]:                                             │
│    546 │   │   for db in dbs_set:                                                                │
│    547 │   │   │   for dirname in THUMB_LDIRS:                                                   │
│ ❱  548 │   │   │   │   os.makedirs(Path(parent, db, dirname), exist_ok=True)                     │
│    549 │   return names, dbs                                                                     │
│    550                                                                                           │
│    551                                                                                           │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │           data = [                                                                           │ │
│ │                  │   '{',                                                                    │ │
│ │                  │   '"version": "1.5",',                                                    │ │
│ │                  │   '"default_core_path": "",',                                             │ │
│ │                  │   '"default_core_name": "",',                                             │ │
│ │                  │   '"label_display_mode": 0,',                                             │ │
│ │                  │   '"right_thumbnail_mode": 0,',                                           │ │
│ │                  │   '"left_thumbnail_mode": 0,',                                            │ │
│ │                  │   '"sort_mode": 0,',                                                      │ │
│ │                  │   '"scan_content_dir": "D:\\\\games\\\\Commodore\\\\C64",',               │ │
│ │                  │   '"scan_file_exts": "",',                                                │ │
│ │                  │   ... +1680840                                                            │ │
│ │                  ]                                                                           │ │
│ │             db = '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Games\\\\Formula One           │ │
│ │                  (1984)(Argus Specialist P'+14                                               │ │
│ │            dbs = [                                                                           │ │
│ │                  │   '"right_thumbnail_mode"',                                               │ │
│ │                  │   '"scan_search_recursively": t',                                         │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Demos\\\\Chojn╟ⁿw \'94     │ │
│ │                  Compo Music Pack #1 (199'+12,                                               │ │
│ │                  │   '',                                                                     │ │
│ │                  │   '"crc32": "00000000|c',                                                 │ │
│ │                  │   '"core_path": "DETE',                                                   │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Demos\\\\!++ (199x)(20th   │ │
│ │                  Century Composers - S'+7,                                                   │ │
│ │                  │   '',                                                                     │ │
│ │                  │   '"crc32": "00000000|c',                                                 │ │
│ │                  │   '"core_path": "DETE',                                                   │ │
│ │                  │   ... +280131                                                             │ │
│ │                  ]                                                                           │ │
│ │        dbs_set = {                                                                           │ │
│ │                  │   '',                                                                     │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Games\\\\Formula One       │ │
│ │                  (1984)(Argus Specialist P'+14,                                              │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Demos\\\\Homage            │ │
│ │                  (2003-08-09)(Wrath Designs).p',                                             │ │
│ │                  │   '"path":                                                                │ │
│ │                  "D:\\\\games\\\\Commodore\\\\C64\\\\Collections\\\\Tadpole\\\\Tadpole #0459 │ │
│ │                  (19xx)(T'+9,                                                                │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Collections\\\\Derbyshire  │ │
│ │                  Ram\\\\Derbyshire Ram '+53,                                                 │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Collections\\\\Derbyshire  │ │
│ │                  Ram\\\\Derbyshire Ram '+53,                                                 │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Cracktros\\\\Paramount #29 │ │
│ │                  (19xx)(Paramount).p',                                                       │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Games\\\\Computer Craps    │ │
│ │                  (19xx)(Davis, G.W.).t',                                                     │ │
│ │                  │   '"path":                                                                │ │
│ │                  "D:\\\\games\\\\Commodore\\\\C64\\\\Collections\\\\Einstein\\\\Einstein     │ │
│ │                  #1292 (19xx)'+12,                                                           │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Games\\\\Revenge is Sweet  │ │
│ │                  (198x)(-).t',                                                               │ │
│ │                  │   ... +70031                                                              │ │
│ │                  }                                                                           │ │
│ │        dirname = 'Named_Boxarts'                                                             │ │
│ │              f = <_io.TextIOWrapper name='C:\\Program Files\\RetroArch\\playlists\\Commodore │ │
│ │                  - 64.lpl' encoding='utf-8'>                                                 │ │
│ │   gamelineslen = 1680846                                                                     │ │
│ │              i = 1680840                                                                     │ │
│ │           name = '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Games\\\\Über die Autostraße   │ │
│ │                  (19xx)(BB Softwar'+12                                                       │ │
│ │          names = [                                                                           │ │
│ │                  │   '"version": "1.5",',                                                    │ │
│ │                  │   '"sort_mode": 0,',                                                      │ │
│ │                  │   '"scan_filter_dat_content": false,',                                    │ │
│ │                  │   '"core_path": "DETECT",',                                               │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Demos\\\\PETSCÏÏhead       │ │
│ │                  (2020-01-13)(Seven).prg.p'+4,                                               │ │
│ │                  │   '},',                                                                   │ │
│ │                  │   '"crc32": "00000000|crc",',                                             │ │
│ │                  │   '"core_path": "DETECT",',                                               │ │
│ │                  │   '"path": "D:\\\\games\\\\Commodore\\\\C64\\\\Graphics\\\\!dead hires    │ │
│ │                  charset (2020-11-28)(P'+11,                                                 │ │
│ │                  │   '},',                                                                   │ │
│ │                  │   ... +280131                                                             │ │
│ │                  ]                                                                           │ │
│ │         parent = 'D:\\thumbnails\\libretrofuzzjreytl5l'                                      │ │
│ │       playlist = WindowsPath('C:/Program Files/RetroArch/playlists/Commodore - 64.lpl')      │ │
│ │       temp_dir = 'D:\\thumbnails\\libretrofuzzjreytl5l'                                      │ │
│ │ thumbnails_dir = WindowsPath('D:/thumbnails')                                                │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ in makedirs:215                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ exist_ok = True                                                                              │ │
│ │     head = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path":                                    │ │
│ │            "D:\\games\\Commodore\\C64\\Games\\Formula'+44                                    │ │
│ │     mode = 511                                                                               │ │
│ │     name = WindowsPath('D:/thumbnails/libretrofuzzjreytl5l/"path":                           │ │
│ │            "D:/games/Commodore/C64/Games/Formula One (1984)(Argus Specialist                 │ │
│ │            Publications).d/Named_Boxarts')                                                   │ │
│ │     tail = 'Named_Boxarts'                                                                   │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ in makedirs:215                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ exist_ok = True                                                                              │ │
│ │     head = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:\\games\\Commodore\\C64\\Games' │ │
│ │     mode = 511                                                                               │ │
│ │     name = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path":                                    │ │
│ │            "D:\\games\\Commodore\\C64\\Games\\Formula'+44                                    │ │
│ │     tail = 'Formula One (1984)(Argus Specialist Publications).d'                             │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ in makedirs:215                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ exist_ok = True                                                                              │ │
│ │     head = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:\\games\\Commodore\\C64'        │ │
│ │     mode = 511                                                                               │ │
│ │     name = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:\\games\\Commodore\\C64\\Games' │ │
│ │     tail = 'Games'                                                                           │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ in makedirs:215                                                                                  │
│ ╭─────────────────────────────────────── locals ────────────────────────────────────────╮        │
│ │ exist_ok = True                                                                       │        │
│ │     head = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:\\games\\Commodore'      │        │
│ │     mode = 511                                                                        │        │
│ │     name = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:\\games\\Commodore\\C64' │        │
│ │     tail = 'C64'                                                                      │        │
│ ╰───────────────────────────────────────────────────────────────────────────────────────╯        │
│ in makedirs:215                                                                                  │
│ ╭───────────────────────────────────── locals ─────────────────────────────────────╮             │
│ │ exist_ok = True                                                                  │             │
│ │     head = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:\\games'            │             │
│ │     mode = 511                                                                   │             │
│ │     name = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:\\games\\Commodore' │             │
│ │     tail = 'Commodore'                                                           │             │
│ ╰──────────────────────────────────────────────────────────────────────────────────╯             │
│ in makedirs:215                                                                                  │
│ ╭─────────────────────────────── locals ────────────────────────────────╮                        │
│ │ exist_ok = True                                                       │                        │
│ │     head = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:'        │                        │
│ │     mode = 511                                                        │                        │
│ │     name = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:\\games' │                        │
│ │     tail = 'games'                                                    │                        │
│ ╰───────────────────────────────────────────────────────────────────────╯                        │
│ in makedirs:225                                                                                  │
│ ╭──────────────────────────── locals ────────────────────────────╮                               │
│ │ exist_ok = True                                                │                               │
│ │     head = 'D:\\thumbnails\\libretrofuzzjreytl5l'              │                               │
│ │     mode = 511                                                 │                               │
│ │     name = 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:' │                               │
│ │     tail = '"path": "D:'                                       │                               │
│ ╰────────────────────────────────────────────────────────────────╯                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: [WinError 123] Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch: 'D:\\thumbnails\\libretrofuzzjreytl5l\\"path": "D:'

The last error basically says, the syntax for the filename or directory is wrong.

Feathered-Serpent commented 11 months ago

But: seems to be only that one file:

Commodore - 64.lpl -> Commodore - 64
Failure: ! (2002)(Booze Design - Oneway): 43.6 Gremlins 2 - The New Batch (USA, Europe)
Failure: ! (2002)(Booze Design - Oneway): 43.6 Gremlins 2 - The New Batch (USA, Europe)
Failure: !++ (199x)(20th Century Composers - Scoop): 49.8 Scary Monsters (Europe)

the !++ doesn't cause any errors anymore. Only that it doesn't find any thumbnail.

Interestingly, I put the other filename on top of the list:

    {
      "path": "D:\\games\\Commodore\\C64\\Demos\\PETSCÏÏhead (2020-01-13)(Seven).prg",
      "label": "! (2002)(Booze Design - Oneway)",
      "core_path": "DETECT",
      "core_name": "DETECT",
      "crc32": "00000000|crc",
      "db_name": "Commodore - 64.lpl"
    },

but it isn't shown in the --verbose output at all. Does fuzz sort the list somehow after labels first? Waiting for P to arrive to see, if that's the case.

Edit 2: I shortened the playlist to the + file names and the PET file name. the PET name is just ignored and never listed somehow.

i30817 commented 11 months ago

The JSON file appears corrupted.

Did you edit it manually? I think when you saved it on your windows editor it replaced a '\n' by a '\r\n' and that's fucking with the JSON python library used. (this doesn't matter because python converts all kinds of line endings, it was a JSON decoding exception going for the fallback, if you can before trying deleting post the playlist giving the exception here)

Try to delete that commodee playlist and regenerate that playlist from RetroArch.

Still not sure if your playlist generation method takes the name from the filename or not, but to be safe you can restore the filename of the files.

You don't need to run fuzzall just to check this bug, libretro-fuzz --no-meta on just the commodore 64 playlist should be enough. That should be faster.

As for the speed, it probably makes sense that it grows with the size of the playlist. There are probably optimizations that could be done, but I have no clue how to do them without a real computer and a profiler.

i30817 commented 11 months ago

but it isn't shown in the --verbose output at all. Does fuzz sort the list somehow after labels first? Waiting for P to arrive to see, if that's the case.

This program ONLY operates on labels. It could break if any string in the JSON was not unicode yes, since it broke as soon as the JSON parser found that malformed unicode. But it only compares the labels to the remote thumbnails.

This is because the labels is what RA uses as thumbnail names (because it allows RA to display names that are different than the filenames or illegal in filenames).

In your example the petisc name is on the path, what derives what I was calling filename here. The actual label on the entry is completely different "! (2002)(Booze Design - Oneway)", so petisc will never appear in the output.

Well that said, someone last month added a option for ra to use the filename as the mapping to the thumbnails instead of the label. It's a terrible shitty idea More, especially since following through will require adding duplicate thumbnails, so just ignore it.

i30817 commented 11 months ago

Please tell me if you have a error with a newly created non manually edited playlist. I could probably forbid \r somehow after extracting the JSON, but it will make the program slightly slower and more memory hungry yet.

As far as I can tell JSON is supposed to be pure utf8, with just newline, not carriage return format, and that is what the scanners in RetroArch create.

You can also attach the failing file here so I can check it out.

i30817 commented 11 months ago

Ok, I'm a fool. Reading your posted stacktrace with more attention reveals that it fell through in the code to the exception branch after a JSON decoding error and it's attempting to use the much much older format of RetroArch playlists which wasn't JSON but a horrible text thing.

The tell is gamelineslen being initialized in the method call variables, since that is only initialized and used in the exception branch.

https://github.com/i30817/libretrofuzz/blob/afcf69833b8eb17db8f34c0012e1f5095139d363/libretrofuzz/__main__.py#L534

So I still want the original playlist, because it will reveal a error in my handling of JSON (probably) and the weird directories it's attempting to create are a side effect of being mistaken about the format after a unexpected error.

It's also still possible your edit of the file broke something in the JSON though.

It would maybe be interesting to tell if a file was supposed to be JSON with some heuristics before trying to handle the older format, which some people requested before, or maybe I'll just remove support for it again. People have to update their playlists sooner or later right? A exception skipping the playlist saying "corrupt JSON file or unsupported older libretro playlist format" is more than enough for them to get the hint, if it's not something I messed up.

Feathered-Serpent commented 11 months ago

I first ran the automatic scan of Retroarch against the Commodore 64 directory (and well, all others). The thing is, Retroarch doesn't really know lots of C64 stuff. So I ran the manual scan afterwards to add everything else afterwards. Saved as uncompressed playlist the file has around 60MB in size, as rzip compressed it was around 4MB. And if I remove anything from the JSON, I take care to keep all intendations for the rest of the entries.

I could send you the playlist file. Though the two files from the starter of this issue have been renamed already. I could backup my current playlist for C64, rename the files back, make an automatic and manual scan (which takes a while... even after cleaning up the C64 directory has 210,000 files) and present you the playlist file Retroarch was generating then.

i30817 commented 11 months ago

Sure. Please send a zipped plain JSON playlist, not ra compressed so I can analyse it more easily.

But you probably will want use RetroArch to make a new one to see if the problem still exists and doesn't come from edits

You can attach files to issues with drag and drop or clicking a link in the message box.

Feathered-Serpent commented 11 months ago

You'll have to wait a bit... started the scanning 7 hours ago, and obviously retroarch says, it's only at 11%

i30817 commented 11 months ago

Wow. If I were you I'd always use the manual scanner for those. Holy shit.

Feathered-Serpent commented 11 months ago

Heh, but then I won't get any "correct" labes, as Retroarch checks the CRC checksum against it's database. But somehow it seems to only have been that collection directory which took so long.

After auto scanning, i added manual scanning, then let retroarch clean the playlist once again. So here's the zip file with the initial bad file names/labels. Commodore - 64.zip

i30817 commented 11 months ago

Well I just invoked the function stand alone (still on the tablet lol), after butchering the unneeded imports to make the code run, and no decode error with the file you gave.

And by run I mean I'm running the function directly from main and giving the path arguments to the 3 files it needs then printing the result. With a grep for limits among others it ends printing up without a decode error

$ python3 __main__.py readPlaylistAndPrepareDirectories down/Commodore.lpl a b | grep -o -P '.{0,10}PETSC.{0,10}'
...
reedh)', 'PETSCÏÏhead (20
...

So I don't THINK this error exists anymore?

I think that when you edited the file manually you might have corrupted the file encoding or the JSON structure, which is the reason it 'fell back' to the different parsing method for the older format. Then rescanning fixed it.

What I think I should do is maybe to retire support for that older format since it's causing problems diagnosing user errors and there is no way to recognize it apart from a JSON error.

i30817 commented 11 months ago

Think I found a probably reliable way to detect the older playlist format so the last misleading error shouldn't happen to other people that screwed up the JSON (just the error saying the JSON is screwed up).

Edit: and now if it detects a charset error it asks the user to resave the playlist as utf-8 from their manual edits instead of replacing the bogus characters by a placeholder (JSON is supposed to be always unicode and utf8 is the saner default option).

Closed for real now, any complaint, please post.

Feathered-Serpent commented 11 months ago

Looking at the releases, you were really busy the last days :D Thanks for your work.

i30817 commented 11 months ago

Not so much busy, but really careless. I did 2 little features (install and run in android\termux and these 2 bugfixes), but for each I went 'oh right... this is incomplete or wrong' 3-5 times. Should be fine now.

i30817 commented 11 months ago

@Feathered-Serpent since you complained about speed, 3.4.9 implements a optimization where the program checks before fuzzy comparing each label (game) if all the thumbnails for that game already exist at the cost of 3 file exists checks, which should be much faster.

Basically it should make repeated runs when all the thumbnails were already downloaded faster.

Just 'all thumbnails downloaded' and not 'all thumbnails that have a match in the remote names downloaded' because the whole point is to avoid the match if unneeded, and that is the only condition where it can be avoided without calculating it. Well it could also be avoided with 'at least one', but sometimes the server updates missing thumbnails. Maybe with no merge, hmmmm.

Still, it's a common one and should make repeated runs faster.

i30817 commented 11 months ago

For the performance, I just noticed you didn't mean the time between downloads but the time it starts to get to the first download.

I missed it because when I was checking this bug I only tested the function loading the JSON and that was quick. I'm profiling now (still in a android tablet 😂) but I kind of expect it's the preparation step when I normalize the strings for better matches done ahead of time. Doing it ahead of time is better than in the loop, but it does appear to scale terribly anyway...

I suspect I'll only be able to run a progress bar and allow interrupts to improve the situation. It's annoying not being able to cancel.

Feathered-Serpent commented 11 months ago

I guess, as long as people don't have playlists with >50,000 or even >100,000 entries, they won't notice this much. Smaller playlists surely seem to start nearly instantly.

i30817 commented 11 months ago

I managed to fix the waiting time for normalization of names with some multiprocessing. But I'm unsatisfied with the fuzzy matches (when it prints something) on large playlists. The function wratio (from rapidfuzz, not my own) really gets 91% of the time spent in the scorer function and it spends 1-3 seconds on each failing match. It makes some sense that larger playlists indicate a larger remote list of names to match, but this correlation seems a bit sus.

I'll do a new release soon and at least you won't have to wait 2-5 minutes to wait to process the commodore 64 playlist 😂 (just 50 to 1.3 minutes on this 2018 3rd tier tablet)