Closed xTVaser closed 2 years ago
This is a long reply, but hopefully it can get us started thinking about this. We probably have a lot of cleanup left in Jak 1, and like 1 million lines of Jak 2 to do in the future, so process improvements are definitely worth thinking about. If you didn't notice I got kind of burnt out working on the decompiler and a huge number of features are currently unimplemented or were never done right. It was never really planned to be a "full decompiler" that tried to make nice source code, but slowly evolved to this without much design. At the bottom I've put together a list of what I think are absolute requirements for a future version.
We already have some support for multiple versions with the decompiler config system. I think it is enough for totally separate games where you don't want to share anything between them, but it doesn't work that well for different versions of games that are only a tiny bit different. I'm not sure of an easy way to solve this. It also becomes a nightmare to test because all the different versions have tiny differences (jak 1 demo has the opposite endianness for VAG files, jak 2's demo has different block sizes for compressed DGOs, jak 3's demo has a different v5 link format...). In total there's >20 versions of all the games, and I've had people ask about extracting textures/models/audio from all of them.
For the common problems, there are a lot of checks I think we could implement that could catch many of them. Right now there's nothing that checks that your stack layout or label types are actually free of overlaps. Realistically some of these are very hard to reliably detect and we'll probably never do it perfectly. If we were okay with having some false positives, we could try to detect common stuff like the res-tag cast and missing args. I also think that a cleaner output would make it much more obvious when something like this happens. Ideally, casts are rare enough that whenever we see one, we look closely and understand why.
I think the decompiler should output some JSON that gets read by a GUI. So this could help solve the text parsing issue. Your scripts have been super useful, but I agree it would be nicer if we didn't have so much text-based replacement. An interactive decompiler would be cool but a lot of work. If you had a GUI that looked at the JSON, you could make it "interactive" by rerunning the decompiler whenever the user changes a setting and clicks a re-run button.
The code updating is a tricky problem. All your ideas are good, but I don't think any of them will completely solve the problem. It might be better if we try to finalize files more before moving on to the next one. I'm not a fan of the giant PRs with 200 files changed and 10,000's of lines of renaming changes. We got through the first ~1/3 of the game being able to do this, but it became harder once we got to the later code. Large refactorings are going to be annoying and introduce random bugs, no matter what we do.
Compilation speed is a tricky one. The current build system ((mi)
) is able to avoid rebuilding things if it knows that none of its dependencies have changed. But, we don't know that for GOAL code and we just tell it that every file depends on every single file before it in the build order. I guess you are still using (build-game)
because not everything is build with (mi)
yet, so we should fix that. It seems possible to detect the dependencies for things like functions/macros, but it's harder when you have macros that set global GOOS variables. This is something I'm really curious about in the original GOAL. How did they handle this?
defun
to not split across multiple lines. Also longer than 80 character lines.GenericMatcher
stuff is confusing), but we'd lose a lot of context. An s-expression matcher is currently a bit tricky to do because the decompiler sometimes cheats in its s-expression generation and will output a single symbol like :foo bar
in order to work around the pretty printer being terrible.Yeah I agree with all you've said, and a few of your points i was thinking of originally but forgot half-way through writing mine.
To respond to a few things directly:
I feel like a lot of the workflow tooling could be solved with something like a VSCode extension Potential workflow improvements when decompiling:
Potential improvements for OpenGOAL in general:
IR Output Ideas:
As a tangent, the update-gsrc flow needs to be more resilient and useful (might do this asap as its annoying):
We should experiment with reducing the burden of merge conflicts as well.
We'll want https://github.com/open-goal/jak-project/issues/497 for the future as well. Methods were far more annoying than functions to deal with.
A more intelligent type analysis is a must IMO. Our casts file is seven thousand lines long, the stack structures file is five thousand and the var_names
is >4000. Linearly scaling this and that's like over 30k lines of random configuration for Jak 2. This is WAAAAY too much work and really difficult to manage. An S-expression macro matcher would also be extremely useful, specially if we could summon it for specific contexts only to avoid false positives. The GenericMatcher
is fine if we want a little more context, I think we can reasonably make it just work a little better/add more features.
On the topic of type analysis, a way to the tell decompiler that we don't care about the specific type of a function, or that it can be "anything", would be extremely useful. This is mainly an issue for the event handlers, which there are already a lot of in Jak 1, which end up looking... terrible. There's basically no instance where the output doesn't desperately need some clean-up, which we can't do because it would be overwritten in case we wanted to re-decompile the file or refactor the output, so we have to deal with it. Their code certainly never looked like this, for example:
(let ((v1-0 arg2))
(the-as int (cond
((= v1-0 'untrigger)
(the-as int (when (= (-> arg0 type) light-eco-child)
(let* ((a1-3 (-> arg3 param 0))
(v0-0 (logxor (-> self angle-mask) (ash 1 a1-3)))
)
(set! (-> self angle-mask) v0-0)
v0-0
)
)
)
)
((= v1-0 'trigger)
(the-as int (when (not (-> self player-got-eco?))
(set! (-> self player-got-eco?) #t)
(let ((a1-5 (new 'stack-no-clear 'event-message-block)))
(set! (-> a1-5 from) self)
(set! (-> a1-5 num-params) 0)
(set! (-> a1-5 message) 'white-eco-picked-up)
(the-as int (send-event-function (ppointer->process (-> self parent)) a1-5))
)
)
)
)
((= v1-0 'beam-off)
(the-as int (go light-eco-mother-discipate))
)
)
)
)
Plus, manually cleaning up code just means another way that we can make mistakes.
I think reducing the amount of casts, macros and other kinds of cleanup that we have to do for the decompiler would pay off in the long run the most.
Edit: If we could at least fix #578 and #852 it would be very helpful. The former straight up causes the decompiler to break completely (load-game-text-info
) and the latter partially causes the thing above.
These are all good points.
I agree 100% that we need to improve the process of adding casts. We should have way fewer of them, and it should be easier to figure out. I was never really happy with how this worked out for jak 1 level code, but I was too distracted with trying to get a head start on graphics, so we wouldn't reach the end and just be stuck waiting on all the renderers (yet here we are...) You could imagine the decompiler spits out a line like:
didn't know the type of a1 here, input a type
and then you can just type in the type name. I don't know if this is a good idea exactly, but you get the point - it can be easier. The current workflow was designed around just decompiling a few files where the biggest concern was accuracy over convenience and readability. Which is something I think we did really well with. The majority of files worked perfectly and we never had to do some huge "oops decompiler bug, now we have to update 100 files"
For the example you gave, I'd argue that setting the type to object
should just do this, and all the weird the-as int
s and the v0-0
following (set! (-> self angle-mask) v0-0)
is just stuff that's not quite right with how it inserts casts (why is it trying to make that block integer anyway..). Some of it was probably an overreaction on my part to a few bugs we had early on with uint128
s returning on only some cases of a cond
in res.gc
(didn't end up mattering... the game never uses this) and with the return value of a event
handler being lost in actor-link
.
The unfortunate thing is that a lot of these fixes (better type analysis, s-expression matcher, cleaning up the event handler example you showed) would require some pretty large parts of the decompiler to be rewritten. There's a lot I got wrong in the design and I know I could do better with what I know now. That said, the decompiler
and type system is about 69k lines of C++ and there are some parts that were extremely hard to get right. I'm not completely opposed to doing any of these improvements, but it's going to take months to do. I had a lot more free time to work on this project in 2020/early 2021 than I do now.
None of this is a decision we have to make now, but we should think about it.
There are a lot of things that we're improved along the way, but as we get close to the end of the decompilation aspect -- there's less incentive to make it better. But we'll want to take a step back and improve some issues before moving onto the next games.
This is just a place to capture those things we'd like to improve.
Project Structure
Decompilation
OpenGOAL src handling