keystone-engine / keystone

Keystone assembler framework: Core (Arm, Arm64, Hexagon, Mips, PowerPC, Sparc, SystemZ & X86) + bindings
http://www.keystone-engine.org
GNU General Public License v2.0
2.3k stars 456 forks source link

Wrong disassembly with inline comments in assembly #402

Open brendabrandy opened 5 years ago

brendabrandy commented 5 years ago

When I use kstool to assemble the following instruction with a comment, I got an error:

$ kstool arm64 "MOV X29, SP ;hello"
ERROR: failed on ks_asm() with count = 0, error = 'Invalid mnemonic (KS_ERR_ASM_MNEMONICFAIL)' (code = 514)

but it worked properly without a comment:

$ kstool arm64 "MOV X29, SP"
MOV X29, SP = [ fd 03 00 91 ]

Is there an expected syntax for commenting for keystone, or keystone cannot differentiate between comments and operands of an instruction? If so, I'd like to submit a patch to support inline comments.

zeroSteiner commented 5 years ago

If you take a look at the C example on http://www.keystone-engine.org/ towards the bottom you'll see it mention You can either separate assembly instructions in this string by “;” or “\n”. so I believe this is the intended behavior. I'm not a project member but I agree a patch to enable comments with a semicolon would be pretty useful.

sskras commented 2 years ago

@zeroSteiner commented on Feb 6, 2019:

I agree a patch to enable comments with a semicolon would be pretty useful.

+1

n3rada commented 1 month ago

This is clearly a problem. Here is a python3 method that allow you to deal with this:

def clean_assembly_code(asm_code: str) -> str:
    """
    Cleans assembly code by stripping comments and unnecessary semicolons, while preserving indentation and line breaks.

    Args:
        asm_code (str): The original assembly code as a string.

    Returns:
        str: The cleaned assembly code with indentation preserved.
    """
    cleaned_lines = []

    for line in asm_code.splitlines():
        # Split line at first semicolon (for comments) and keep only the instruction part
        code = line.split(";", 1)[
            0
        ].rstrip()  # Strip trailing spaces from the code part
        if code:  # Only include non-empty lines
            cleaned_lines.append(code)

    # Join cleaned lines, preserving original newlines and indentation
    return "\n".join(cleaned_lines)