Closed Anuskuss closed 3 years ago
Hmm.. may be the title buffer is fully filled and it doesn't have enough space to put null to terminate the string?
C0 80
is definitely a questionable encoding of NUL, we should probably make a test to be sure. In the worst case, this could actually be a bug in sceCcc and I'd rather fix it there if so.
Do saves created on PSPs include this in the SFO (i.e. from gamefaqs)? Do PPSSPP created saves show as expected on PSPs?
I could see the utf-8 decoder in the PSP firmware treating this as a NUL, but still storing it in the SFO. If yes, we should have the same behavior so we show PSP-created saves correctly.
The savedata buffer for this is 128 bytes iirc, so probably not truncated.
-[Unknown]
@unknownbrackets Grabbing a save from GameFAQs includes the same string so it's definitely not a PPSSPP bug. Creating a new game on my PSP displays it correctly (cut-off). Also even though C0 80
is illegal (because it's longer than necessary), it should be interpreted as NUL
.
Buzz!: Master Quiz also affected this issue
Not the same issue, but interesting nevertheless:
42 75 7A 7A 21 F0 82 84 A2 3A 20 4D 61 73 74 65 72 20 51 75 69 7A
B u z z ! � � � � : M a s t e r Q u i z
™ in UTF-8 is E2 84 A2
but it's (F0) 82 84 A2
here. Maybe they mistyped it or it's some other encoding. What does it look like on a PSP?
Probably just a different representation of the same symbol in 3-bytes and 4-bytes, just like Euro sign here https://en.wikipedia.org/wiki/UTF-8#Examples
The three bytes
11100010 10000010 10101100
can be more concisely written in hexadecimal, asE2 82 AC
.Overlong encodings
In principle, it would be possible to inflate the number of bytes in an encoding by padding the code point with leading 0s. To encode the Euro sign € from the above example in four bytes instead of three, it could be padded with leading 0s until it was 21 bits long –
000 000010 000010 101100
, and encoded as11110000 10000010 10000010 10101100
(orF0 82 82 AC
in hexadecimal). This is called an overlong encoding.The standard specifies that the correct encoding of a code point uses only the minimum number of bytes required to hold the significant bits of the code point. Longer encodings are called overlong and are not valid UTF-8 representations of the code point. This rule maintains a one-to-one correspondence between code points and their valid encodings, so that there is a unique valid encoding for each code point. This ensures that string comparisons and searches are well-defined.
Right, so illegal encodings according to the standard, but as we are an emulator and games are using them and the real PSP handles them, we should support them too indeed... Fun.
The ENTR thing is very curious. In the bottom left corner where it shows save data size after saving, it draws this string:
If I just re-encode the overlong encodings to short ones, 0xC0 0x80 (192 128) becomes a null, and we lose the lines about the date and size.
Maybe 0xC0 0x80 ENTR has a special meaning? Just removing those six characters fixes it, but it's kinda ...weird.
Hmm.. may be ENTR
translated to ASCII code 13 (CR) and combined with the next ASCII code 10 (LR) ? just like newline (\r\n
) on Windows EOL encoded file, and may be 0xC0 0x80
triggers the ENTR
translation?
Unix: Unix systems consider '\n' as a line terminator. Unix considers \r as going back to the start of the same line.
Mac (up to 9): Older Mac OSs consider '\r' as a newline terminator but newer OS versions have been made to be more compliant with Unix systems to use '\n' as the newline.
Windows: Windows has a different style of newline, Windows supports the combination of both CR and LF as the newline character - '\r\n'.
If it is, there might be some other codes that may need to be translated to.. (will need to be tested on PSP), may be something similar to Autodesk MotionBuilder:
KEY_TO_ID = { "NONE" : -1,
"ESC" : 0x1b, "TAB" : 0x09, "CAPS" : 0x14, "BKSP" : 0x08, "LBR" : 0xdb, "RBR" : 0xdd, "SEMI" : 0xba, "ENTR" : 0x0d,
"SPC" : 0x20, "PRNT" : 0x2c, "SCRL" : 0x91, "PAUS" : 0x13, "INS" : 0x2d, "HOME" : 0x24, "PGUP" : 0x21, "DEL" : 0x2e,
"END" : 0x1b, "PGDN" : 0x1b, "UP" : 0x1b, "LEFT" : 0x1b, "DOWN" : 0x1b, "RGHT" : 0x1b,
"F1" : 0x70,"F2" : 0x71 ,"F3" : 0x72, "F4" : 0x73, "F5" : 0x74, "F6" : 0x75, "F7" : 0x76, "F8" : 0x77, "F9" : 0x78,"F10" : 0x79 ,"F11" : 0x7a, "F12" : 0x7b,
"NUML" : 0x90, "NMUL" : 0x6a, "NADD" : 0x6b, "NDIV" : 0x6f, "NSUB" : 0x6d,"NDEC" : 0x6e ,"N0" : 0x60, "N1" : 0x61, "N2" : 0x62, "N3" : 0x63, "N4" : 0x64, "N5" : 0x65, "N6" : 0x66,"N7" : 0x67 ,"N8" : 0x68, "N9" : 0x69,
...
}
Possible. But extremely odd! I've never seen string-encoded characters like that.
Anyway, gonna send a PR with what I've got.
Btw is the Date and Size information supposed to be printed together with the "Save" text? or it was supposed to be printed separately? (ie. using smaller font size)
If it was supposed to be printed separately it would be make sense to just stop at 0xC0 0x80
(NULL
) isn't? and ignoring the rest of it (which may be uninitialized/garbage data)
It's printed together with the same font on the real PSP.
I'll try to do some testing later to see what it does with different encodings and if it consistently ignores the next 4 bytes after a NUL. I could imagine it being a lot of different things.
It does seem likely based on the recent examples that it's an escape code to the PPGe PSP dialog drawing system...
-[Unknown]
Well, it's not called PPGe on the actual PSP, that's just how I named our corresponding thing :) but yeah.
Sorry, that's what I meant, the PSP's version of PPGe, whatever it is (but it may not even have a unified way...)
Some test results with MsgDialog specifically (not savedata):
Based on the above, MsgDialog seems to validate the first byte, and then simply mask 0x3F of all following bytes. If the first byte is invalid, it terminates the string (or is ignored, if control code.) Overlong encodings are treated naively and no escape sequences or special handling appears present.
It's like sceCcc simply works this same way.
Will do some testing of PARAM.SFO string display next in the savedata section.
-[Unknown]
TIL pressing X on savedata starts the game that made the savedata. Not sure how I never knew this before.
SAVEDATA using Mana Khemia, just as an example, seems to follow the same rules as above. The "\xC0\x80ENTR" sequence resulted in truncation. The other bytes were treated as above, i.e the bits of continuation bytes outside 0x3F were ignored.
Do we know for sure that the data after ENTR (albeit useful) actually shows up on a PSP?
Thinking we just specifically nuke "\xC0\x80ENTR" (which we already do, to be clear) and then normalize overlong encodings. I suspect this is just a game bug that accidentally resulted in truncation on real PSPs and no one noticed.
-[Unknown]
@unknownbrackets Yes, the data does show up on the real PSP. Although potentially it might come from other fields in the structs that we miss?
(hm, think I need to dust it...)
Ah, I missed that it was just the date. That's a different field indeed, we just concatenated.
To be fair, I didn't try different firmware versions. I also know that firmware does have some workarounds for game bugs in a few specific cases. I think Xele02 posted a list of game IDs found in firmware once years ago... but it wasn't a long list iirc.
-[Unknown]
Hmm save file issue for secret agent clank is fixed but Buzz: Master Quiz still not fixed.
What happens?
In the save dialog, the game uses
53 65 63 72 65 74 20 41 67 65 6E 74 20 43 6C 61 6E 6B E2 84 A2 C0 80 45 4E 54 52
as the title, which should be rendered asSecret Agent Clank™
but it showsSecret Agent Clank™��ENTR
instead.What should happen?
PPSSPP should treat
C0 80
asNUL
and terminate the string.What are you using?
What PPSSPP version (standalone/official), and did it work before?
1.11.3
Which game or games?
Secret Agent Clank (all variants)
Checklist