isledecomp / reccmp

Collection of decompilation tools.
GNU Affero General Public License v3.0
7 stars 4 forks source link

Allow code annotations near preprocessor directives? #21

Open disinvite opened 1 week ago

disinvite commented 1 week ago

isleapp.cpp has these two string annotations:

// STRING: ISLE 0x4101c4
#define WNDCLASS_NAME "Lego Island MainNoM App"

// STRING: ISLE 0x4101dc
#define WINDOW_TITLE "LEGO\xAE"

Should this be allowed? It's perfectly clear what is intended, but those lines are not "code" per se. Then again, we are not a compiler, so there's no requirement that these lines be excluded.

jonschz commented 1 week ago

I briefly looked into it and found out that

Personal opinion: The // STRING: <target> <address> format should be reserved for strings that (could) matter to reccmp. Comments that are intended for human readers only can (and imho, should) use a non-standard / natural language format which reccmp does not care about or lint, e.g. // string at ISLE 0x4101c4.

disinvite commented 6 days ago

Thanks for checking! I agree 100%. I think we only need to annotate very short strings that could easily be confused for pointer values or numbers.

Is it reasonable to throw a syntax error if an annotation comes before a preprocessor statement?

Once we get environment variables added to reccmp, the token-based parser will handle something like this and not alert to address reuse:

#ifdef _DEBUG
// FUNCTION: HELLO 0x1234
void TestFunction() {
    // debug version of the function
}
#else
// FUNCTION: HELLO 0x1234
void TestFunction() {
    // regular version of the function
}
#endif

Whether we want that sort of annotation syntax is perhaps another topic for discussion.

jonschz commented 6 days ago

Is it reasonable to throw a syntax error if an annotation comes before a preprocessor statement?

If there is no way that the annotation will be processed correctly, imho yes. We could either throw a hard error or log a warning that this annotation is misplaced and cannot be processed. Both would be adequate in my eyes.

madebr commented 5 days ago

If an annotation is not used or unexpected, I'd emit a warning (if the warning type is enabled): e.g.[-Wunused-annotation]: unused annotation And like compilers do, emit an error when -Werror is passed (but that's not a critical requirement)

jonschz commented 5 days ago

How about "emit warnings by default, fail on warnings when some flag is set"? I think one of our tools (decomplint?) already does that.

disinvite commented 5 days ago

How about "emit warnings by default, fail on warnings when some flag is set"? I think one of our tools (decomplint?) already does that.

Yes. The --warnfail flag will exit with nonzero status if there are any warnings.

For reference, the WIP code for the tokenizer parser is here. The current design removes all preprocessor statements before handing the tokens off to the annotation parser. We could allow #define to come through and that would make these string annotations still valid.

But if we ignore the implementation detail for a second, what should happen in general?

  1. Syntax error if the annotation precedes a preprocessor statement (i.e. we refuse to read it)
  2. Same as 1 but syntax warning (provided we can read something)
  3. Allow annotations above a #define but not other preprocessor statements. (And either a warning or error if this happens)
  4. Just ignore preprocessor statements. Syntax problems may occur as they would if the preprocessor lines were not there
jonschz commented 4 days ago

I would log a warning and ignore this particular annotation, which sounds like your option 2.