riscvarchive / riscv-binutils-gdb

RISC-V backports for binutils-gdb. Development is done upstream at the FSF.
GNU General Public License v2.0
147 stars 233 forks source link

objcopy -O ihex creates confusing record types #177

Closed brucehoult closed 4 years ago

brucehoult commented 4 years ago

objcopy creates hex files that use record types 02 (segment base) and 03 (CS:IP start address) when it should probably be using record types 04 (linear address upper 16 bits) and 05 (linear start PC) respectively.

Basically, record types 02 and 03 are specific to 20 bit addresses, not 32 bit, so I don't see a reason to use them for RISC-V, just because the numbers happen to be less than one megabyte.

The output would be better if record type 02 was changed to 04, and record type 03 to 05 and the "upper" 16 bits changed from 1000 to 0001.

Here is a sample file:

:020000021000EC
:1000540013051000970500009385C5011306E00001
:1000640093080004730000009308D0057300000097
:1000740048656C6C6F20524953432D56210A000089
:040000031000005495
:00000001FF

Interpretation:

  len addr ty data                             cs
:  02 0000 02 1000                             EC segment address (multiply by 16)
:  10 0054 00 13051000970500009385C5011306E000 01 data
:  10 0064 00 93080004730000009308D00573000000 97 data
:  10 0074 00 48656C6C6F20524953432D56210A0000 89 data
:  04 0000 03 10000054                         95 start PC CS:IP
:  00 0000 01                                  FF eof

objdump of the code

00010054 <_start>:
   10054:   00100513            li  a0,1
   10058:   00000597            auipc   a1,0x0
   1005c:   01c58593            addi    a1,a1,28 # 10074 <_start+0x20>
   10060:   00e00613            li  a2,14
   10064:   04000893            li  a7,64
   10068:   00000073            ecall
   1006c:   05d00893            li  a7,93
   10070:   00000073            ecall
   10074:   6548                    flw fa0,12(a0)
   10076:   6c6c                    flw fa1,92(s0)
   10078:   4952206f            j   32d0c <__global_pointer$+0x21488>
   1007c:   562d4353            0x562d4353
   10080:   0a21                    addi    s4,s4,8
jim-wilson commented 4 years ago

Not clear why this is wrong. The addresses do fit in 20-bits, and the addresses do not cross a 64-bit boundary. objcopy only switches to record types 4 and 5 when addresses don't fit into 20-bits. I do realize that RISC-V has a linear address space unlike the 16-bit x86, but it isn't clear that emitting records this way for RISC-V is wrong.

I would suggest submitting bug reports to FSF binutils instead of here. I'm the only one responding to bugs here, and I don't have time to fix anything that isn't causing immediate problems. Bugs reported here may be lost sometime in the future if/when this git tree dies. See https://sourceware.org/binutils/

brucehoult commented 4 years ago

Not wrong so much as inappropriate, confusing, and gains nothing over using the linear encoding.

Also, the spec is completely silent on the use of both segmented addressing and linear addressing in the same file and the interaction between them.

After a 02 record is seen, a decoder is required to wrap the record base address + offset within the record if they cross a 64k boundary, before adding them to the scaled segment address. After a 04 record is seen it is required only to wrap at 4G. Thus if a decoder supports both 02 and 04 records, it had to remember which kind it saw last, and implement different wrapping based on that. It's just ugly and unnecessary.

If this use of 02&03 records is done on all proper linear addressing machines from i386 and m68k on then I guess a change is unlikely.

aswaterman commented 4 years ago

While I agree this behavior isn’t ideal, it doesn’t look like it’s specific to the RISC-V port. As Jim suggested, consider reposing it to the binutils mailing list as a generic issue rather than a RISC-V one.

jim-wilson commented 4 years ago

If objcopy switches from an 02 record to an 04 record, it first emits an 02 record with a 0 address. So if the reader does still try to offset a 04 address, there will be no problem. Objcopy does not support any address above 4G in ihex output, so there can be no wrapping at the 4G boundary. objcopy always uses 02 and 03 records if the address fits in 20 bits. It knows nothing about segmentation. I think in general there is no support in gnu tools for x86 style segmented addresses. Using gnu tools on dos requires an extender for instance.

brucehoult commented 4 years ago

Sure there can be wrapping at 4 GB.

If you emit an 04 FFFF and then an 00 FFFF then according to the spec every data byte after the first will wrap back and start from 0x00000000. Or if you have 00 FFF0 and the 00 data record contains more than 16 bytes etc.

Similarly, in segmented mode, if you have an 02 FFFF and then more than one byte in the 00 data record then the address should be wrapped back to 0000 before being adding to the scaled segment start address. Weirdly, the spec doesn't seem to forbid addresses greater than FFFF:000F (and friends), which are above 1 MB.

This may of course not be possible in output from objcopy, but files that do this can be constructed so the decoder has to allow for it.

jim-wilson commented 4 years ago

Like I already said, this isn't possible in objcopy.