pret / pokeemerald

Decompilation of Pokémon Emerald
2.2k stars 2.36k forks source link

Support C-Style `enum` in preproc #1984

Closed SBird1337 closed 2 months ago

SBird1337 commented 5 months ago

Gives basic support for parsing C-Style enum expressions when using preproc.

Description

In order to deal with situations where one would rather want to use an enum but can't because preproc does not know how to deal with it when compiling scripts and other assembly files together with regular C TUs.

Example: test.h:

#ifndef TEST_H_
#define TEST_H_

enum TestEnum {
    A=010,B,C,D,
    E=1, F, G=0x3, H,

    I=
    0b1100,

    J, K,

L
};

#endif

test.s:

#include "test.h"

mov r0, #A

turns into

# 1 "test.s"
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "<stdin>"
# 1 "test.s"
# 1 "test.h" 1

# 5 "test.h"
.equiv A, 8
# 5 "test.h"
.equiv B, 9
# 5 "test.h"
.equiv C, 10
# 5 "test.h"
.equiv D, 11
# 6 "test.h"
.equiv E, 1
# 6 "test.h"
.equiv F, 2
# 6 "test.h"
.equiv G, 3
# 6 "test.h"
.equiv H, 4
# 8 "test.h"
.equiv I, 12
# 11 "test.h"
.equiv J, 13
# 11 "test.h"
.equiv K, 14
# 13 "test.h"
.equiv L, 15
# 2 "test.s" 2

mov r0, #A

GNU-Style line indicators are added s.t. errors can be traced back to the original source file (e.g. if an enum field is supposed to be redefined)

Discord contact info

karathan

SBird1337 commented 5 months ago

Converting to draft because I kind of forget the enum assembly macro edge case

SBird1337 commented 5 months ago

fixed the issue and rebased.

mid-kid commented 4 months ago

my 2ct that pretty much applies to all of preproc is that papering over things with a custom preprocessor that parses "just enough" of the syntax but not all of it will cause a lot of confusion and frustration from developers down the line, who will expect the compiler to behave like a normal C compiler/assembler. Take for example the following code:

#define X(prefix) prefix##_A, prefix##_B, prefix##_C
enum {
    X(foo),
    X(bar)
};

I don't think things like this will work with the current PR as-is, and I can't begin to think what other syntax quirks may pop up when people try doing more out-there things.

I should mention that the snippet above primarily wouldn't work because for some reason, assembler files are first passed through preproc before going through the C preprocessor, instead of the other way around. I think the snippet above specifically could be made to work without too much fuss, but my point is mostly about it being hard to cover all corner cases, and adhere to the principle of least surprise when a user discovers it's not "real C".

GriffinRichards commented 4 months ago

Agreed, but I'd wager the lack of any support for C-style enums in assembly files is a bigger headache than incomplete support. We're already mixing C header files into assembly, so the door for this kind of confusion is open. These sorts of incompatibilities are basically always going to be a problem with preproc like you said, and replacing it is beyond this PR.

mid-kid commented 4 months ago

The C preprocessor has its own syntax that can be clearly discerned by the # prefix, making it fairly suitable for mixing with a bunch of languages. It avoids the confusion I'm referring to by not pretending to be something it's not, which preproc is trying to do here.

Anyway, I'm not asking to remove preproc as a whole, but make you consider that this feature might make the situation worse by making certain problems harder to debug: If the issue I pointed out in the previous message is not accointed for it's rather easy to accidentally make an enum that has different values depending on whether it's compiled for ASM or for C. I'd like to try to minimize this sort of thing, at the very least...

pkmnsnfrn commented 2 months ago

I can very confidently say that there are more users that will be served by this PR versus users that are expecting the C compiler to behave a certain way. The majority of our userbase doesn't even know what a compiler is!

mid-kid commented 2 months ago

If newbies are modifying C, often with the intention of learning C, it's especially destructive to introduce hard-to-debug footguns like this. How do you expect newbies to realize that the code they learned from google, a friend, or a book, isn't valid here? That's what I'm concerned about: This preprocessor isn't complete enough to be C, and it's not obvious that it's not C. People will expect it to be C and be confused when their game breaks.

mrgriffin commented 2 months ago

mid-kid is right that the preprocessor interacts weirdly with our assembler pipeline. But there's already some sharp edges in that interaction: I can't remember exactly the examples, but I think people get themselves into a mess using .if along with preprocessor defines.

So I think as long as the enum syntax matches the syntax that's available in C this won't be too confusing for newbies (I think all that's missing is trailing comma support as mentioned by GriffinR?). From a quick glance at the Makefile changes there's now two preproc passes so perhaps if enum processing was delayed until that second pass mid-kid's example that mixes enum and #define would work?

SBird, I can take a look at PRing these changes to your branch if you'd like?

SBird1337 commented 2 months ago

Incorporated MGriffins changes to address the following issues:

E: I generally agree that preproc is a bit of a monster in terms of hacky workarounds, but since we are living with it and the lack of enum support is causing troubles I think this makes the workflow better overall :)

GriffinRichards commented 2 months ago

This likewise fails to build for me, with only the usage information as output. I suspect it's a difference with getopt. It might be worth throwing a macOS build on the workflow at some point.