sonicretro / s1disasm

Sonic 1 Disassembly
335 stars 97 forks source link

Consider splitting the disassembly into multiple files in a broader and more systematic way #53

Open ketsuban opened 3 years ago

ketsuban commented 3 years ago

I'll grant that when Nemesis introduced the concept of the split disassembly it was a big step forward, but between seeing things like the work pret has done and knowing from the Nick Arcade prototype that Sonic Team's code made liberal use of crossreferences and symbols exports, I'm forced to wonder why Sonic ROM hacking persists in working with a single giant text file. Look at pokecrystal, for example - the main file main.asm is only 20.3 kilobytes and rarely needs to be touched because all the code is separated out into other files for ease of reference and crossreference. By contrast, sonic.asm is 237 kilobytes, and it gets worse the more featureful the games get - s2.asm is 2.54 megabytes, sonic3k.asm is 4.25 megabytes.

Awuwunya commented 3 years ago

This is a very good point and a constant debate whenever disassemblies are put together. The easiest way I can explain it is: Our disassemblies by and in large are meant to be bit-perfect, and achieving that while splitting files often means the files are split in a really dumb way, because random routines will be inside random objects. This disassembly handles this fairly poorly, though it could also be far worse, too. A single file may be split into 2 because a common library routine is in the middle. We would either have to figure out a work around for this that would leave the disasembly bit perfect, or ditch bit-perfectness and possibly introduce more bugs into the games. Furthermore, this would also lead to people complaining that its hard to find anything (as often people do with this disassembly) and no matter of structuring disassemblies well will make people fully happy.

The real answer is: Nobody can agree exactly on what a disassembly should be like, and we're still debating things instead of trying to make better ones, whether alone or in smaller groups that agree. We would need people who are interested and mostly agreeing on what exactly to do. This has been proposed several times by many community figures, but so far things have fallen through

TorutheRedFox commented 2 years ago

The S3K disassembly definitely needs more splitting though

90% of the game code is in the main file making locating specific things a huge pain

DevsArchive commented 2 years ago

Should be worth noting that the Sonic 2 diassembly could be a potential good reference to see how the source code was actually split up. Through that debug mode code leak, and other things found inside the Nick Arcade proto, and also what's known about REV02, a lot of those JmpTos were generated by the assembler. They were appened at the end of file's code, and as such, those JmpTos can be used to identify where an original source file ends. With that, estimated guesses can be applied to both Sonic 1 and 3. I know that this was discussed in s2disasm, but it's worth mentioning here.

The Sonic object, for instance, can be assumed to be 1 file. Starting from the top of the object code to where the next object's code starts. The collision functions were actually a separate file for holding general "floor collision" files (I think it was called FCOL.ASM or something like that). In my opinion, some of the splitting choices in the current version are a bit ridiculous. I don't think single functions need their own file, nor do I think the Sonic object needs like 10.

TorutheRedFox commented 2 years ago

and the ELF files for the gems collection version of SCD

at least for the main engine itself

DevsArchive commented 2 years ago

Could actual code be assigned to whatever filenames were left in those ELF files? Otherwise, no, not really, you just get the symbol data, and I'm not quite sure if that's really within the scope for the disassembly (besides historical value).

Clownacy commented 2 years ago

I once decompiled a Linux game that had leftover debug data in its ELF file which did assign symbols to filenames. Unfortunately, I don't recall which 'objdump' command I used to extract it, and I don't know if SCD's ELFs contains that data as well.

TorutheRedFox commented 2 years ago

I'm fairly sure that if paths are present, debug data is present too

Clownacy commented 2 years ago

I know that, I mean that that debug data might not contain symbol-path associations. The ELFs of Sonic CD and the ELF of the game I decompiled were made almost ten years apart.

TorutheRedFox commented 2 years ago

ah

DevsArchive commented 2 years ago

"-l" displays filenames and line numbers. I ran it on R11A.ELF and it indeed has the information.

For example:

813080c0 <action>:
action():
813080c0:   94 21 ff f0     stwu    r1,-16(r1)
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:40
813080c4:   7c 08 02 a6     mflr    r0
813080c8:   90 01 00 14     stw     r0,20(r1)
813080cc:   93 e1 00 0c     stw     r31,12(r1)
813080d0:   93 c1 00 08     stw     r30,8(r1)
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:44
813080d4:   3c 60 81 36     lis     r3,-32458
813080d8:   3b e3 c2 c4     addi    r31,r3,-15676
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:45
813080dc:   3b c0 00 00     li      r30,0
813080e0:   48 00 00 40     b       81308120 <action+0x60>
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:46
813080e4:   88 1f 00 00     lbz     r0,0(r31)
813080e8:   28 00 00 00     cmplwi  r0,0
813080ec:   41 82 00 2c     beq     81308118 <action+0x58>
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:53
813080f0:   7f e3 fb 78     mr      r3,r31
813080f4:   88 9f 00 00     lbz     r4,0(r31)
813080f8:   38 04 ff ff     addi    r0,r4,-1
813080fc:   54 05 10 3a     rlwinm  r5,r0,2,0,29
81308100:   3c 80 81 35     lis     r4,-32459
81308104:   38 04 42 70     addi    r0,r4,17008
81308108:   7c 80 2a 14     add     r4,r0,r5
8130810c:   81 84 00 00     lwz     r12,0(r4)
81308110:   7d 89 03 a6     mtctr   r12
81308114:   4e 80 04 21     bctrl
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:56
81308118:   3b de 00 01     addi    r30,r30,1
8130811c:   3b ff 00 44     addi    r31,r31,68
81308120:   2c 1e 00 80     cmpwi   r30,128
81308124:   41 80 ff c0     blt     813080e4 <action+0x24>
C:\project\GEMS\application\SonicCD\src\gc\main\ACTION.C:57
81308128:   83 e1 00 0c     lwz     r31,12(r1)
8130812c:   83 c1 00 08     lwz     r30,8(r1)
81308130:   80 01 00 14     lwz     r0,20(r1)
81308134:   7c 08 03 a6     mtlr    r0
81308138:   38 21 00 10     addi    r1,r1,16
8130813c:   4e 80 00 20     blr
Clownacy commented 2 years ago

Heck yeah!

TorutheRedFox commented 2 years ago

epic

at this point we can just completely decompile the GEMS version of Sonic CD down to the line number level

DevsArchive commented 2 years ago

Not exactly to the line number, but it gives a generally good guide to how the code was set up. A Sonic CD C decomp would be interesting, but that's for a different place. Regardless, it can help out with figuring out how the original source files from Sonic CD, and to an extent, Sonic 1, were set up.

DevsArchive commented 1 year ago

I would like to add something regarding ROM sections.

I've been taking a look at Sonic Jam, and I realized that each game has been split up into different files. In Sonic 1's case, there's "AC.SN1", "ACTTBL.SN1" (also ACTTBL_E.SN1" and "ACTTBL_N.SN1", because Jam has different difficulty settings), "DATA.SN1", and "TBL.SN1". "AC" is the game code, "DATA" holds compressed graphics, tilemaps, and stage blocks/chunks. "TBL" holds collision data, uncompressed graphics, and stage layouts (including special stages), and "ACTTBL" holds object layouts.

I then took a look at the disassembly, and I noticed that the padding between sections corresponds to how the game was split up in Jam.

"DATA" starts off with the Sega Logo graphics, and in the original ROM, you can see the padding placed before said graphics.

        rept $300
            dc.b    $FF
        endm
Nem_SegaLogo:   binclude    "artnem/Sega Logo (JP1).bin" ; large Sega logo
            even

The last piece of data in the "DATA" file is the graphics for the logo in the ending, and look at that, another piece of padding right after it in the original ROM:

Nem_EndStH: binclude    "artnem/Ending - StH Logo.bin"
        even

        if Revision=0
        rept $104
        dc.b $FF            ; why?
        endm
        else
        rept $40
        dc.b $FF
        endm
        endif

After this bit of padding is the stage collision data, which so happens to be the "TBL" section


        if Revision=0
        rept $104
        dc.b $FF            ; why?
        endm
        else
        rept $40
        dc.b $FF
        endm
        endif
; ---------------------------------------------------------------------------
; Collision data
; ---------------------------------------------------------------------------
AngleMap:   binclude    "collide/Angle Map.bin"
        even
CollArray1: binclude    "collide/Collision Array (Normal).bin"
        even
        ...

The last bit of data in the "TBL" file is the graphics for the special stage ring, and in the original ROM, you can see another bit of padding placed after it:

Art_BigRing:    binclude    "artunc/Giant Ring.bin"
        even

        align   $100

And after that are the stage object layouts, aka "ACTTBL":

        align   $100

; ---------------------------------------------------------------------------
; Sprite locations index
; ---------------------------------------------------------------------------
ObjPos_Index:
        ; GHZ
        dc.w ObjPos_GHZ1-ObjPos_Index, ObjPos_Null-ObjPos_Index
        dc.w ObjPos_GHZ2-ObjPos_Index, ObjPos_Null-ObjPos_Index
        dc.w ObjPos_GHZ3-ObjPos_Index, ObjPos_Null-ObjPos_Index
        dc.w ObjPos_GHZ1-ObjPos_Index, ObjPos_Null-ObjPos_Index

The "ACTTBL" file ends with the last object layout, and, of course, in the original ROM, after that is some more padding:

ObjPos_Null:    dc.b $FF, $FF, 0, 0, 0, 0

        if Revision=0
        rept $62A
        dc.b $FF
        endm
        else
        rept $63C
        dc.b $FF
        endm
        endif

Which is then followed by the sound driver after.

So, based on this info, you can see that the original game had multiple ROM sections: the code ("AC"), compressed graphics, tilemaps, and stage blocks and chunks ("DATA"), stage collision, layouts (both regular and special stages), and uncompressed graphics ("TBL"), stage object layouts ("ACTTBL"), and then finally the sound driver and data.