LLVM Toolchain handling of assembler_target needs to know which GNU assembler will be used

The abstract method _get_assembler_target of Toolchain is supposed to figure out the "target" that will be passed to the assembler. How this "target" is actually used by/passed to the assembler is up to the Toolchain implementation to figure out.

The LLVM toolchain implementation assumes that it will be passing the returned target (stored as self._assembler_target) to the assembler via the option -march (llvm.py:40):

self._assembler_flags.append(f"-march={self._assembler_target}")

However, there are a few problems here:

The LLVM toolchain assumes that the assembler which will be used has an -march option, which seems reasonable but the PPC toolchain we recently added doesn't actually have this, so I guess it's not universal.

The way the Toolchain by default chooses the assembler is by looking in the toolchain.conf, so we don't know for sure which assembler will be used. The LLVM toolchain does not override this the default behavior, shown here (abstract.py:185:

def _assembler_path(self) -> str:
    """
    Provides path to installed assembler given the ISA.

    :raises NotImplementedError: if an assembler for that ISA does not exist
    :returns: filepath to the assembler program
    """
    if self._processor.isa == InstructionSet.M68K:
        assembler_path = "M68K_ASM_PATH"
    elif (
        self._processor.isa == InstructionSet.X86
        and self._processor.bit_width == BitWidth.BIT_64
    ):
        assembler_path = "X86_64_ASM_PATH"
    else:
        assembler_path = f"{self._processor.isa.value.upper()}_ASM_PATH"
    return get_repository_config("ASM", assembler_path)

The LLVM toolchain does not appear to really have its own assembler, so it's not a case where it should clearly just set its own assembler correctly.

Which files would be affected? ofrak_patch_maker/toolchain/llvm_12.py

Does the proposed maintenance include non-doc string functional changes to the Python code? Yes.

A simple fix is for the LLVM to store the entire assembler arch argument in _assembler_target, instead of only part of the argument and constructing the full thing later. The disadvantage to this is that if the user wants to pass in a specific assembler target, they have to pass in the whole toolchain option (e.g. "-march=armv7" instead of "armv7"). Passing arbitrary command-line options through a config is... atypical in PatchMaker.

A more complex fix is figure out if we can create an LLVM assembler that it knows it will use, then the arguments can be constructed consistently.

Are you interested in implementing it yourself? Yes but it's not high-priority right now.

redballoonsecurity / ofrak

LLVM Toolchain handling of assembler_target needs to know which GNU assembler will be used #263