xorvoid / dis86

A disassembler and decompiler for 8086 DOS Binaries
28 stars 1 forks source link

Hi #1

Closed xor2003 closed 5 months ago

xor2003 commented 5 months ago

Hi, it seems we doing similar things with reverse enginerring: https://github.com/xor2003/libdosbox I discovered a method to achieve full source code translation within a few weeks by comparing each translated instruction in an emulator. https://github.com/xor2003/masm2c You’ve taken it a step further by creating a decompiler. Perhaps it makes sense to merge our efforts. Before finding your project, I was considering reusing the angr decompiler and https://github.com/albertan017/LLM4Decompile

xorvoid commented 5 months ago

Super cool!

Curious what your goals are and what you’re working on?

I’m doing it to support a video game reimplementation effort.

If you’re interested in contributing to dis86, would love to have you. There’s still a lot to do. I’m basically only adding features as I required them. So I’m fairly sure it’s a little “overfit” to my problem at the moment. All of that is solveable: I think I selected a solid architecture for the decompiler which should give it substantial extensibility.

-xorvoid

On Mon, Apr 8, 2024 at 1:40 AM x0r @.***> wrote:

Hi, looks like we doing similar thing with reverse enginerring: https://github.com/xor2003/libdosbox I was able to find a way to get full source code translation in several weeks by comparing each translated instruction in emulator. https://github.com/xor2003/masm2c You made one step further to make decompiler. So maybe make sense to merge it all toghether. Before I found your project I was thinking to reuse angr https://docs.angr.io/en/latest/ decompiler and https://github.com/albertan017/LLM4Decompile

— Reply to this email directly, view it on GitHub https://github.com/xorvoid/dis86/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOKLWCVJSST5WA3TNNZN3OTY4I3XNAVCNFSM6AAAAABF4CUWL6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTAMZYGU2DIMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

xorvoid commented 5 months ago

I haven't looked at angr for this project. Does it even have x86-16 support?? At this stage it's a somewhat obscure platform and most tools don't have great support.

I couldn't imagine LLM4Decompile could possibly be a good choice here. (1) it's trained on comparing modern compiles to decompiles, so you'd have to train one with some x86-16 compiler and (2) if, like me, you're going for correct semantic lifting then using LLMs in general is questionable due to their statistical approach / hallucinations (I don't want to hunt down hallucinations in mystery machine code)

xorvoid commented 5 months ago

If you want to learn more about this project: https://xorvoid.com/dis86.html

LowLevelMahn commented 5 months ago

your blog statement is not correct

The most traditional tool used for this task is IDA Pro. Sadly they dropped x86 16-bit real mode support some time ago

the Hexrays guys didn't drop 16 real mode support in IDA - only in the free versions - the current 8.4 is fully 16bit real mode capable

xorvoid commented 5 months ago

Oh, thanks! I read somewhere that it was dropped and you had to use a really old version.

I’ve never used IDA actually. I was too poor to buy it when I was younger and by the time I had the money, I had become a good enough engineer to just build it myself. Such is life.

How good is the IDA decompiler? Why don’t you use that if you’re already using IDA? (I’m sort of just asking more about your requirements)

On Mon, Apr 8, 2024 at 10:32 AM LowLevelMahn @.***> wrote:

FYI:

The most traditional tool used for this task is IDA Pro. Sadly they dropped x86 16-bit real mode support some time ago

the Hexrays guys didn't drop 16 real mode support - only in the free versions - the current 8.4 is fully 16bit real mode capable

— Reply to this email directly, view it on GitHub https://github.com/xorvoid/dis86/issues/1#issuecomment-2043069385, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOKLWCXZMYRDMMBOMM6GJHDY4K2CRAVCNFSM6AAAAABF4CUWL6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGA3DSMZYGU . You are receiving this because you commented.Message ID: @.***>

LowLevelMahn commented 5 months ago

How good is the IDA decompiler? Why don’t you use that if you’re already using IDA? (I’m sort of just asking more about your requirements)

the decompiler is best on marked - by far better then every competitor, for example Ghidra, Binary Ninja etc. - its just a different world

but.... the decompiler only supports 32/64bit code :( so good old, seg:ofs code don't get analysed - only very good disassembled

xor2003 commented 5 months ago

My goals are the same: decompile couple games I love.

Actually we have chat of the same interests: https://discord.gg/uCYCyGq9 It is related to similar project called Spice86. It is disassembler (.NET) and translator. They trying to decompile Dune I.

Angr does not have 16 bit support yet. But it looks feasible to implement converter to their IR. Since IR do support 16 bit operations. And also it is widely used

xor2003 commented 5 months ago

During the decompilation process, it’s crucial to ensure that the code isn’t broken. I’m utilizing DOSBox emulation (libdosbox; something similator to hidra) to verify each assembly instruction. It’s essential to examine each instruction to confirm that the disassembly operates identically to the original binary code, ensuring nothing is broken. Example of fully working game code https://github.com/xor2003/libdosbox/blob/libdosbox/src/custom/src_td3/tdiii_seg000.cpp

To develop an effective decompiler, I believe it’s necessary to automatically generate unit tests for each original binary function to verify the decompiler’s results. This is what is not done yet.

Related LLM4Decompile: I have small dataset of x86_16 C.