ppy / osu-stable-issues

Report critical osu-stable issues here
59 stars 11 forks source link

Beatmaps containing files with Unicode characters incorrectly exported #1147

Open aryoadhi opened 10 months ago

aryoadhi commented 10 months ago

These maps are either can't be imported at all, or imported with missing files. This can happen to virtually anything inside the beatmap, including:

The example is this map, though I believe there are more: https://osu.ppy.sh/beatmapsets/1980743#taiko/4113017

image

Left file name û│æÞ15_20230418161243.jpg is the ones with incorrect encoding I got from downloading and importing the map, and right file name 無題15_20230418161243.jpg is the actual file name the .osu file is referring at.

I'm not sure why osu!stable supports Unicode characters on file names, but can't export it properly. A ZIP file does support Unicode encoding for file names. From some reports I seen from this thread ( https://osu.ppy.sh/community/forums/posts/9181024 ), it seems that osu! tries to encode these file names to Windows-1252 character page, however I tried it out some with these characters and it still incorrectly exported.

The greater issue is on rankability: currently on Ranking Criteria, there is no rules regarding the presence of Unicode file names. There is a guideline, but it doesn't stop anyone from ranking these problematic maps.

Avoid non-alphanumeric unicode characters in a difficulty's name. These can cause errors with the beatmap submission system and problems for certain users when appearing in chat.

Easy solution is to simply prevent beatmap uploads if there are characters outside of whatever ZIP processor it doesn't support, but sticking to ASCII is the best option since all keyboards can type all of these printable ASCII characters. Another option would be encoding these Unicode file names into Base64url, since these characters are safe to use on file names. It could also strip out the file names, but it might introduce issues when multiple Unicode file names containing same extension present.