Open ketsuban opened 3 years ago
This is a very good point and a constant debate whenever disassemblies are put together. The easiest way I can explain it is: Our disassemblies by and in large are meant to be bit-perfect, and achieving that while splitting files often means the files are split in a really dumb way, because random routines will be inside random objects. This disassembly handles this fairly poorly, though it could also be far worse, too. A single file may be split into 2 because a common library routine is in the middle. We would either have to figure out a work around for this that would leave the disasembly bit perfect, or ditch bit-perfectness and possibly introduce more bugs into the games. Furthermore, this would also lead to people complaining that its hard to find anything (as often people do with this disassembly) and no matter of structuring disassemblies well will make people fully happy.
The real answer is: Nobody can agree exactly on what a disassembly should be like, and we're still debating things instead of trying to make better ones, whether alone or in smaller groups that agree. We would need people who are interested and mostly agreeing on what exactly to do. This has been proposed several times by many community figures, but so far things have fallen through
The S3K disassembly definitely needs more splitting though
90% of the game code is in the main file making locating specific things a huge pain
Should be worth noting that the Sonic 2 diassembly could be a potential good reference to see how the source code was actually split up. Through that debug mode code leak, and other things found inside the Nick Arcade proto, and also what's known about REV02, a lot of those JmpTos were generated by the assembler. They were appened at the end of file's code, and as such, those JmpTos can be used to identify where an original source file ends. With that, estimated guesses can be applied to both Sonic 1 and 3. I know that this was discussed in s2disasm, but it's worth mentioning here.
The Sonic object, for instance, can be assumed to be 1 file. Starting from the top of the object code to where the next object's code starts. The collision functions were actually a separate file for holding general "floor collision" files (I think it was called FCOL.ASM or something like that). In my opinion, some of the splitting choices in the current version are a bit ridiculous. I don't think single functions need their own file, nor do I think the Sonic object needs like 10.
and the ELF files for the gems collection version of SCD
at least for the main engine itself
Could actual code be assigned to whatever filenames were left in those ELF files? Otherwise, no, not really, you just get the symbol data, and I'm not quite sure if that's really within the scope for the disassembly (besides historical value).
I once decompiled a Linux game that had leftover debug data in its ELF file which did assign symbols to filenames. Unfortunately, I don't recall which 'objdump' command I used to extract it, and I don't know if SCD's ELFs contains that data as well.
I'm fairly sure that if paths are present, debug data is present too
I know that, I mean that that debug data might not contain symbol-path associations. The ELFs of Sonic CD and the ELF of the game I decompiled were made almost ten years apart.
ah
"-l" displays filenames and line numbers. I ran it on R11A.ELF and it indeed has the information.
For example:
813080c0 <action>:
action():
813080c0: 94 21 ff f0 stwu r1,-16(r1)
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:40
813080c4: 7c 08 02 a6 mflr r0
813080c8: 90 01 00 14 stw r0,20(r1)
813080cc: 93 e1 00 0c stw r31,12(r1)
813080d0: 93 c1 00 08 stw r30,8(r1)
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:44
813080d4: 3c 60 81 36 lis r3,-32458
813080d8: 3b e3 c2 c4 addi r31,r3,-15676
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:45
813080dc: 3b c0 00 00 li r30,0
813080e0: 48 00 00 40 b 81308120 <action+0x60>
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:46
813080e4: 88 1f 00 00 lbz r0,0(r31)
813080e8: 28 00 00 00 cmplwi r0,0
813080ec: 41 82 00 2c beq 81308118 <action+0x58>
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:53
813080f0: 7f e3 fb 78 mr r3,r31
813080f4: 88 9f 00 00 lbz r4,0(r31)
813080f8: 38 04 ff ff addi r0,r4,-1
813080fc: 54 05 10 3a rlwinm r5,r0,2,0,29
81308100: 3c 80 81 35 lis r4,-32459
81308104: 38 04 42 70 addi r0,r4,17008
81308108: 7c 80 2a 14 add r4,r0,r5
8130810c: 81 84 00 00 lwz r12,0(r4)
81308110: 7d 89 03 a6 mtctr r12
81308114: 4e 80 04 21 bctrl
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:56
81308118: 3b de 00 01 addi r30,r30,1
8130811c: 3b ff 00 44 addi r31,r31,68
81308120: 2c 1e 00 80 cmpwi r30,128
81308124: 41 80 ff c0 blt 813080e4 <action+0x24>
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:57
81308128: 83 e1 00 0c lwz r31,12(r1)
8130812c: 83 c1 00 08 lwz r30,8(r1)
81308130: 80 01 00 14 lwz r0,20(r1)
81308134: 7c 08 03 a6 mtlr r0
81308138: 38 21 00 10 addi r1,r1,16
8130813c: 4e 80 00 20 blr
Heck yeah!
epic
at this point we can just completely decompile the GEMS version of Sonic CD down to the line number level
Not exactly to the line number, but it gives a generally good guide to how the code was set up. A Sonic CD C decomp would be interesting, but that's for a different place. Regardless, it can help out with figuring out how the original source files from Sonic CD, and to an extent, Sonic 1, were set up.
I would like to add something regarding ROM sections.
I've been taking a look at Sonic Jam, and I realized that each game has been split up into different files. In Sonic 1's case, there's "AC.SN1", "ACTTBL.SN1" (also ACTTBL_E.SN1" and "ACTTBL_N.SN1", because Jam has different difficulty settings), "DATA.SN1", and "TBL.SN1". "AC" is the game code, "DATA" holds compressed graphics, tilemaps, and stage blocks/chunks. "TBL" holds collision data, uncompressed graphics, and stage layouts (including special stages), and "ACTTBL" holds object layouts.
I then took a look at the disassembly, and I noticed that the padding between sections corresponds to how the game was split up in Jam.
"DATA" starts off with the Sega Logo graphics, and in the original ROM, you can see the padding placed before said graphics.
rept $300
dc.b $FF
endm
Nem_SegaLogo: binclude "artnem/Sega Logo (JP1).bin" ; large Sega logo
even
The last piece of data in the "DATA" file is the graphics for the logo in the ending, and look at that, another piece of padding right after it in the original ROM:
Nem_EndStH: binclude "artnem/Ending - StH Logo.bin"
even
if Revision=0
rept $104
dc.b $FF ; why?
endm
else
rept $40
dc.b $FF
endm
endif
After this bit of padding is the stage collision data, which so happens to be the "TBL" section
if Revision=0
rept $104
dc.b $FF ; why?
endm
else
rept $40
dc.b $FF
endm
endif
; ---------------------------------------------------------------------------
; Collision data
; ---------------------------------------------------------------------------
AngleMap: binclude "collide/Angle Map.bin"
even
CollArray1: binclude "collide/Collision Array (Normal).bin"
even
...
The last bit of data in the "TBL" file is the graphics for the special stage ring, and in the original ROM, you can see another bit of padding placed after it:
Art_BigRing: binclude "artunc/Giant Ring.bin"
even
align $100
And after that are the stage object layouts, aka "ACTTBL":
align $100
; ---------------------------------------------------------------------------
; Sprite locations index
; ---------------------------------------------------------------------------
ObjPos_Index:
; GHZ
dc.w ObjPos_GHZ1-ObjPos_Index, ObjPos_Null-ObjPos_Index
dc.w ObjPos_GHZ2-ObjPos_Index, ObjPos_Null-ObjPos_Index
dc.w ObjPos_GHZ3-ObjPos_Index, ObjPos_Null-ObjPos_Index
dc.w ObjPos_GHZ1-ObjPos_Index, ObjPos_Null-ObjPos_Index
The "ACTTBL" file ends with the last object layout, and, of course, in the original ROM, after that is some more padding:
ObjPos_Null: dc.b $FF, $FF, 0, 0, 0, 0
if Revision=0
rept $62A
dc.b $FF
endm
else
rept $63C
dc.b $FF
endm
endif
Which is then followed by the sound driver after.
So, based on this info, you can see that the original game had multiple ROM sections: the code ("AC"), compressed graphics, tilemaps, and stage blocks and chunks ("DATA"), stage collision, layouts (both regular and special stages), and uncompressed graphics ("TBL"), stage object layouts ("ACTTBL"), and then finally the sound driver and data.
I'll grant that when Nemesis introduced the concept of the split disassembly it was a big step forward, but between seeing things like the work pret has done and knowing from the Nick Arcade prototype that Sonic Team's code made liberal use of crossreferences and symbols exports, I'm forced to wonder why Sonic ROM hacking persists in working with a single giant text file. Look at pokecrystal, for example - the main file
main.asm
is only 20.3 kilobytes and rarely needs to be touched because all the code is separated out into other files for ease of reference and crossreference. By contrast,sonic.asm
is 237 kilobytes, and it gets worse the more featureful the games get -s2.asm
is 2.54 megabytes,sonic3k.asm
is 4.25 megabytes.