hrydgard / ppsspp

A PSP emulator for Android, Windows, Mac and Linux, written in C++. Want to contribute? Join us on Discord at https://discord.gg/5NJB6dD or just send pull requests / issues. For discussion use the forums at forums.ppsspp.org.
https://www.ppsspp.org
Other
11.45k stars 2.19k forks source link

Secret Agent Clank: Garbage characters #14297

Closed Anuskuss closed 3 years ago

Anuskuss commented 3 years ago

What happens?

In the save dialog, the game uses 53 65 63 72 65 74 20 41 67 65 6E 74 20 43 6C 61 6E 6B E2 84 A2 C0 80 45 4E 54 52 as the title, which should be rendered as Secret Agent Clank™ but it shows Secret Agent Clank™��ENTR instead.

What should happen?

PPSSPP should treat C0 80 as NUL and terminate the string.

What are you using?

What PPSSPP version (standalone/official), and did it work before?

1.11.3

Which game or games?

Secret Agent Clank (all variants)

Checklist

anr2me commented 3 years ago

Hmm.. may be the title buffer is fully filled and it doesn't have enough space to put null to terminate the string?

unknownbrackets commented 3 years ago

C0 80 is definitely a questionable encoding of NUL, we should probably make a test to be sure. In the worst case, this could actually be a bug in sceCcc and I'd rather fix it there if so.

Do saves created on PSPs include this in the SFO (i.e. from gamefaqs)? Do PPSSPP created saves show as expected on PSPs?

I could see the utf-8 decoder in the PSP firmware treating this as a NUL, but still storing it in the SFO. If yes, we should have the same behavior so we show PSP-created saves correctly.

The savedata buffer for this is 128 bytes iirc, so probably not truncated.

-[Unknown]

Anuskuss commented 3 years ago

@unknownbrackets Grabbing a save from GameFAQs includes the same string so it's definitely not a PPSSPP bug. Creating a new game on my PSP displays it correctly (cut-off). Also even though C0 80 is illegal (because it's longer than necessary), it should be interpreted as NUL.

Panderner commented 3 years ago

Buzz!: Master Quiz also affected this issue IMG_20210316_120048

Anuskuss commented 3 years ago

Not the same issue, but interesting nevertheless:

42 75 7A 7A 21 F0 82 84 A2 3A 20 4D 61 73 74 65 72 20 51 75 69 7A
B  u  z  z  !  � � � �  :     M  a  s  t  e  r     Q  u  i  z

™ in UTF-8 is E2 84 A2 but it's (F0) 82 84 A2 here. Maybe they mistyped it or it's some other encoding. What does it look like on a PSP?

anr2me commented 3 years ago

Probably just a different representation of the same symbol in 3-bytes and 4-bytes, just like Euro sign here https://en.wikipedia.org/wiki/UTF-8#Examples

The three bytes 11100010 10000010 10101100 can be more concisely written in hexadecimal, as E2 82 AC.

Overlong encodings

In principle, it would be possible to inflate the number of bytes in an encoding by padding the code point with leading 0s. To encode the Euro sign € from the above example in four bytes instead of three, it could be padded with leading 0s until it was 21 bits long – 000 000010 000010 101100, and encoded as 11110000 10000010 10000010 10101100 (or F0 82 82 AC in hexadecimal). This is called an overlong encoding.

The standard specifies that the correct encoding of a code point uses only the minimum number of bytes required to hold the significant bits of the code point. Longer encodings are called overlong and are not valid UTF-8 representations of the code point. This rule maintains a one-to-one correspondence between code points and their valid encodings, so that there is a unique valid encoding for each code point. This ensures that string comparisons and searches are well-defined.

hrydgard commented 3 years ago

Right, so illegal encodings according to the standard, but as we are an emulator and games are using them and the real PSP handles them, we should support them too indeed... Fun.

hrydgard commented 3 years ago

The ENTR thing is very curious. In the bottom left corner where it shows save data size after saving, it draws this string:

image

image

If I just re-encode the overlong encodings to short ones, 0xC0 0x80 (192 128) becomes a null, and we lose the lines about the date and size.

Maybe 0xC0 0x80 ENTR has a special meaning? Just removing those six characters fixes it, but it's kinda ...weird.

anr2me commented 3 years ago

Hmm.. may be ENTR translated to ASCII code 13 (CR) and combined with the next ASCII code 10 (LR) ? just like newline (\r\n) on Windows EOL encoded file, and may be 0xC0 0x80 triggers the ENTR translation?

Unix: Unix systems consider '\n' as a line terminator. Unix considers \r as going back to the start of the same line.

Mac (up to 9): Older Mac OSs consider '\r' as a newline terminator but newer OS versions have been made to be more compliant with Unix systems to use '\n' as the newline.

Windows: Windows has a different style of newline, Windows supports the combination of both CR and LF as the newline character - '\r\n'.

If it is, there might be some other codes that may need to be translated to.. (will need to be tested on PSP), may be something similar to Autodesk MotionBuilder:

KEY_TO_ID = { "NONE" : -1,
"ESC" : 0x1b, "TAB" : 0x09, "CAPS" : 0x14, "BKSP" : 0x08, "LBR" : 0xdb, "RBR" : 0xdd, "SEMI" : 0xba, "ENTR" : 0x0d,
"SPC" : 0x20, "PRNT" : 0x2c, "SCRL" : 0x91, "PAUS" : 0x13, "INS" : 0x2d, "HOME" : 0x24, "PGUP" : 0x21, "DEL" : 0x2e, 
"END" : 0x1b, "PGDN" : 0x1b, "UP" : 0x1b, "LEFT" : 0x1b, "DOWN" : 0x1b, "RGHT" : 0x1b, 
"F1" : 0x70,"F2" : 0x71 ,"F3" : 0x72, "F4" : 0x73, "F5" : 0x74, "F6" : 0x75, "F7" : 0x76, "F8" : 0x77, "F9" : 0x78,"F10" : 0x79 ,"F11" : 0x7a, "F12" : 0x7b, 
"NUML" : 0x90, "NMUL" : 0x6a, "NADD" : 0x6b, "NDIV" : 0x6f, "NSUB" : 0x6d,"NDEC" : 0x6e ,"N0" : 0x60, "N1" : 0x61, "N2" : 0x62, "N3" : 0x63, "N4" : 0x64, "N5" : 0x65, "N6" : 0x66,"N7" : 0x67 ,"N8" : 0x68, "N9" : 0x69,
...
}
hrydgard commented 3 years ago

Possible. But extremely odd! I've never seen string-encoded characters like that.

Anyway, gonna send a PR with what I've got.

anr2me commented 3 years ago

Btw is the Date and Size information supposed to be printed together with the "Save" text? or it was supposed to be printed separately? (ie. using smaller font size)

If it was supposed to be printed separately it would be make sense to just stop at 0xC0 0x80(NULL) isn't? and ignoring the rest of it (which may be uninitialized/garbage data)

hrydgard commented 3 years ago

It's printed together with the same font on the real PSP.

unknownbrackets commented 3 years ago

I'll try to do some testing later to see what it does with different encodings and if it consistently ignores the next 4 bytes after a NUL. I could imagine it being a lot of different things.

It does seem likely based on the recent examples that it's an escape code to the PPGe PSP dialog drawing system...

-[Unknown]

hrydgard commented 3 years ago

Well, it's not called PPGe on the actual PSP, that's just how I named our corresponding thing :) but yeah.

unknownbrackets commented 3 years ago

Sorry, that's what I meant, the PSP's version of PPGe, whatever it is (but it may not even have a unified way...)

Some test results with MsgDialog specifically (not savedata):

Based on the above, MsgDialog seems to validate the first byte, and then simply mask 0x3F of all following bytes. If the first byte is invalid, it terminates the string (or is ignored, if control code.) Overlong encodings are treated naively and no escape sequences or special handling appears present.

It's like sceCcc simply works this same way.

Will do some testing of PARAM.SFO string display next in the savedata section.

-[Unknown]

unknownbrackets commented 3 years ago

TIL pressing X on savedata starts the game that made the savedata. Not sure how I never knew this before.

SAVEDATA using Mana Khemia, just as an example, seems to follow the same rules as above. The "\xC0\x80ENTR" sequence resulted in truncation. The other bytes were treated as above, i.e the bits of continuation bytes outside 0x3F were ignored.

Do we know for sure that the data after ENTR (albeit useful) actually shows up on a PSP?

Thinking we just specifically nuke "\xC0\x80ENTR" (which we already do, to be clear) and then normalize overlong encodings. I suspect this is just a game bug that accidentally resulted in truncation on real PSPs and no one noticed.

-[Unknown]

hrydgard commented 3 years ago

@unknownbrackets Yes, the data does show up on the real PSP. Although potentially it might come from other fields in the structs that we miss?

hrydgard commented 3 years ago

image

(hm, think I need to dust it...)

unknownbrackets commented 3 years ago

Ah, I missed that it was just the date. That's a different field indeed, we just concatenated.

To be fair, I didn't try different firmware versions. I also know that firmware does have some workarounds for game bugs in a few specific cases. I think Xele02 posted a list of game IDs found in firmware once years ago... but it wasn't a long list iirc.

-[Unknown]

Panderner commented 3 years ago

Hmm save file issue for secret agent clank is fixed but Buzz: Master Quiz still not fixed.