AArch64 assembler - Githubissues

jeffpanici75 commented 4 years ago

I understand you're looking for support implementing an AArch64 inline assembler, which I'd be happy to do. Let me know how you'd like to proceed.

rtrussell commented 4 years ago

That would be very helpful. At the moment the project has two assemblers: bbasmb_arm_32.c which is a naive ARM32 assembler (with currently no support for floating-point or SIMD instructions) and bbasmb_x86_64.c (a table-based x86-64 assembler with fairly good coverage of unprivileged instructions). They should tell you all you need to know about the interface to the rest of BBC BASIC. I'm guessing that an aarch64 assembler would be somewhere in between in terms of complexity.

jeffpanici75 commented 4 years ago

Sounds good. I'll follow the style present in the existing modules and rename the bbasmb_arm_64.TODO.c and put the new implementation there. Are there are guidelines you'd like me to follow for the pull request?

rtrussell commented 4 years ago

Are there are guidelines you'd like me to follow for the pull request?

Nothing that I'm aware of, but at the age of 68 I struggle to keep up with these new-fangled ways of working!

Simon-Willcocks commented 3 years ago

Hi, Jeff, just to let you know that I've been working on this for the last few days. I'm probably duplicating your work, but it's worth it to get a good understanding of the Aarch64 machine code. We've been communicating at the link below, but Richard suggested I should write something here, instead:

https://stardot.org.uk/forums/viewtopic.php?p=323516

jgharston commented 3 years ago

I've started work on an ARM64 assembler. Every now and then I stick the code at http://mdfs.net/Software/ARM/Assembler/

jgharston commented 3 years ago

I've been ploughing through arithmetic and logic, and have got to loads and stores, which was the stumbling block when I started looking at this, so before going further it's probably best to compare what we've each got to.

The thing that was the steep initial step was actually getting a full, complete, list of absolutely all instructions, and their binary encoding. Ok, two things, I'll get the comfy chair. Every other CPU I've written an assembler for it's been simple to find a summary for the entire instruction set that fits on one or two pages - the ARM32 fits on half a page of A5! I ploughed through the 8000-page ARM reference Richard linked to and got it down to about four pages. OpsSorted.txt at the link.

But the next problem is that a lot of instructions don't actually map to the binary instruction in any properly meaningful manner. So I had gone down a dead end of previous assemblers where, eg, b6-b3=operation, b2-b0=address mode, and all operations can use all address modes. I was stuck trying to build a binary representation of all the different LD and ST instructions, but different LD/ST have different address modes they can use. Even worse, LDR/STR allow different registers that none of the other LD/ST instructions use.

Effectively they are all different instructions, and it's much easier to code the assembler to treat them as different instructions, so that's the stage I've got to so far, albeit the reference codes are built up from a bitmap of LD/ST, data size, signed/unsigned, pair/unpaired.

So it would be valuable to compare where we've each got to before plough further in different directions.

jeffpanici75 commented 3 years ago

Work has been keeping me extra busy and I haven't had a chance to start looking into this. I don't want to hold up progress others are making on this front. I'm excited to follow progress!

Simon-Willcocks commented 3 years ago

I had similar problems, and just went for a brute force approach and refined the way I dealt with each instruction as I went along. Anyway, here's my approach, it's about 5000 lines of code. I think it assembles every instruction, the question is if it does it properly!

There are instructions in there that will never be used by someone writing BASIC. I'd like to work out a way to automatically compare the output listing of the assembler with the output from a full-blown assembler. I've checked a few examples, obviously, but it's hardly been rigorous.

Do you know if there's a way to configure BASIC as a command-line program, so it's possible to take a dissassembly, turn it into a BASIC program, then compare the output with the original?

bbasmb_arm_64.c.zip

rtrussell commented 3 years ago

Do you know if there's a way to configure BASIC as a command-line program, so it's possible to take a dissassembly, turn it into a BASIC program, then compare the output with the original?

Yes, configuring BBC BASIC as a command-line assembler is relatively straightforward. However it involves using the mode in which the 'program counter' and the 'code origin' are separated (achieved by setting bit 2 of OPT) and I don't know whether you've implemented that (yet). I'll throw together a little program based on the 64-bit x86 edition that you can try.

Simon-Willcocks commented 3 years ago

Yes, there's definitely a problem with opt 7 and O%, I took a disassembly of some other code and hacked it into a BASIC program, but the jump and literal LDR locations were absolute and nowhere near the location of the allocated memory, so that reported many errors. I "solved" it by adding P% to the locations, because using O% just crashed BASIC.

That mod would be great. (I'm running BASIC on a PC, so I can't actually run the generated code.)

Simon-Willcocks commented 3 years ago

I used BASIC to write the assembled code to a file, then:

od -t x4 -Ax aarch64code.bbc | less aarch64-none-elf-objcopy -I binary -O elf64-littleaarch64 -B aarch64 aarch64code.bbc aarch64code.o aarch64-none-elf-objdump --disassemble-all aarch64code.o | less

Spoiler alert: there are bugs!

Simon-Willcocks commented 3 years ago

Literally the second instruction was wrong. Affecting or using SP with an ADD (and related instructions) has to use extended register or immediate, not shifted register (which should be obvious from the use of <Wd|WSP> as opposed to simply in the documentation.

Simon-Willcocks commented 3 years ago

simply < Wd >

rtrussell commented 3 years ago

I'll throw together a little program based on the 64-bit x86 edition that you can try.

Sorry I haven't done this yet, hopefully I may get around to it tomorrow.

rtrussell commented 3 years ago

Here you go. This program assembles the source file specified on the command line. Note that it doesn't work in the Windows console-mode edition of BBC BASIC; I am investigating that. But it should be OK in the GUI versions and in the console-mode editions on other platforms.

      REM Assembles the file specified on the command line

      INSTALL @lib$ + "stringlib"
      sp% = FNinstrq(@cmd$, " ", 0)
      IF sp% srcfile$ = MID$(@cmd$, sp%+1) ELSE srcfile$ = @cmd$
      p%% = 0 : REM Code origin

      REM Find base filename:
      dot% = FN_instrr(srcfile$, ".", 0)
      dir% = FN_instrr(srcfile$, "/", 0)
      IF dir% = 0 dir% = FN_instrr(srcfile$, "\", 0)
      IF dot% > dir% base$ = LEFT$(srcfile$, dot%-1) ELSE base$ = srcfile$

      REM Open input file:
      srcfile% = OPENIN(srcfile$)
      IF srcfile% = 0 ERROR 0, "Couldn't open file " + srcfile$

      REM Create output files:
      binfile$ = base$ + ".bin"
      binfile% = OPENOUT(binfile$)
      IF binfile% = 0 ERROR 0, "Couldn't create file " + binfile$
      lstfile$ = base$ + ".lst"
      lstfile% = OPENOUT(lstfile$)
      IF lstfile% = 0 ERROR 0, "Couldn't create file " + lstfile$
      CLOSE #lstfile%
      tmpfile$ = @tmp$ + "assemble.tmp.bbc"

      REM Allocate memory for assembled code:
      DIM o%% 1000000 : l%% = o%% + 1000000

      REM Two pass assembly:
      FOR pass% = 12 TO 15 STEP 3
        PTR#srcfile% = 0
        lino% = 0
        tmpfile% = OPENOUT(tmpfile$)
        PROCputline(tmpfile%, 0, "[OPT " + STR$pass%)

        WHILE NOT EOF#srcfile%
          src$ = FNgetline(srcfile%)
          lino% += 1
          IF ASCsrc$ <> &25 PROCputline(tmpfile%, lino%, src$)
        ENDWHILE

        PROCputline(tmpfile%, lino%, "]")
        BPUT #tmpfile%, 0
        CLOSE #tmpfile%

        ]^L% = l%% : ]^O% = o%% : ]^P% = p%%
        OSCLI "spool """ + lstfile$ + """"
        ON ERROR LOCAL IF FALSE THEN
          CALL tmpfile$
        ELSE
          e$ = REPORT$
          I% = INSTR(e$, " in module")
          IF I% e$ = LEFT$(e$, I%-1)
          PRINT e$ + " at line "; ERL
        ENDIF : RESTORE ERROR
        IF ERR = 17 EXIT WHILE
        *SPOOL

      NEXT pass%
      CLOSE #srcfile%

      FOR C% = 0 TO ]^P% - p%% - 1
        BPUT#binfile%, o%%?C%
      NEXT
      CLOSE #binfile%

      PRINT "Assembly completed, output in " binfile$ ", listing in " lstfile$
      END

      DEF FNgetline(F%)
      LOCAL A$
      A$ = GET$#F%
      IF A$ = "" IF PTR#F%>1 THEN PTR#F%=PTR#F%-2:IF BGET#F%<>BGET#F% A$=GET$#F%
      = A$

      DEF PROCputline(F%, L%, s$)
      BPUT #F%, LEN(s$) + 4
      BPUT #F%, L% MOD 256
      BPUT #F%, L% DIV 256
      PRINT #F%, s$
      ENDPROC

      DEF FNinstrq(A$, B$, S%)
      LOCAL I%, Q%
      REPEAT
        I% = INSTR(A$, B$, S%)
        Q% = INSTR(A$, """", S%)
        IF Q%=0 OR I%<Q% THEN = I%
        S% = INSTR(A$, """", Q%+1)+1
      UNTIL S%=1
      = 0

Simon-Willcocks commented 3 years ago

Very nice! I only just spotted the console directory...

I get an error: File or path not found at line 4 4 INSTALL @lib$ + "stringlib"

Here's a slightly more readable version of what went before, with a couple of bugs fixed for good measure... bbasmb_arm_64.c.zip

In case it's not clear, generally (once I got the hang of it, at least), the I grouped mnemonics by related syntax and/or function (which usually implies encoding), the code identifies what variation of syntax is being used, checks the parameters for size and value, initialises the bit pattern in instruction according to the subset of mnemonics being assembled, finally filling in the parameters in their appropriate places.

I should probably revisit the various LDR/STR options, which were one of the first things I worked on, there's no doubt some more commonality there than I initially noticed.

For what it's worth, the code is packed with "magic numbers", but my justification for that is that if any of them changed, it wouldn't be the same processor we're assembling for. It also makes certain mistakes easier to spot; just look for duplicate patterns!

Which I've just done, and found several more mistakes! :(

rtrussell commented 3 years ago

I get an error: File or path not found at line 4

That shouldn't happen if you've installed from the binary distribution, the zip file extracts all the files (including the libraries) to the correct relative places. If you're building from source you need to make sure you run the binary in the project directory (note that the last line of the makefile is cp bbcbasic ../../).

Simon-Willcocks commented 3 years ago

So simple! Thanks, that's what I was missing. It works fine, now.

Simon-Willcocks commented 3 years ago

I was just pondering what a rigorous test would look like, and I realised: computers are fast, these days!

I just tried running seq 0 $(( 0xffffffff )), and only takes just over a minute, even on this old PC; I can probably run every possible 32-bit value into a disassembler, put that into the assembler, and compare the output to the input, and it won't take more than an hour or so!

rtrussell commented 3 years ago

I can probably run every possible 32-bit value into a disassembler

So long as the disassembler behaves sensibly when presented with a non-existent opcode, anyway. That's one advantage of a CPU with a fixed instruction length; try generating every possible x86-64 instruction, which can be from one byte to 16 bytes in length!

Simon-Willcocks commented 3 years ago

Yes! Real life isn't as accommodating as I'd hoped, I wonder if the disassembler from objdump is usable, without having to write files and execute processes for every value...

Undefined instructions are easy, it writes, e.g.: 0: 000251c1 .inst 0x000251c1 ; undefined

Simon-Willcocks commented 3 years ago

Oh, look. There are an awful lot of floating point and SIMD instructions...

383 variations, in fact. :(

rtrussell commented 3 years ago

There are an awful lot of floating point and SIMD instructions...

Indeed. All I would ask is that any assembler is architecturally capable of being extended to handle them at some future point. My current 32-bit ARM assembler doesn't support them, and they're not necessary for the profiling and debugging capabilities that are the principal application for assembly language (i.e. writing interrupt service routines, which you can't do in BASIC). Do you happen to know whether any version of Acorn's ARM BASIC V assembler supports them?

Simon-Willcocks commented 3 years ago

ARM BBC BASIC V has the following, I guess QADD, etc. are FP?

HELP [ Assembly language is contained in [] and assembled at P%. Labels follow '.'. Syntax: SWI|SVC|DBG|HVC|SMC|SMI[] BFC[] ,#,# BFI|SBFX|UBFX[] ,,#,# USAT|SSAT[] ,#, USAT16|SSAT16[] ,#, UXTB|UXTB16|UXTH|UXTAB|UXTAB16|UXTAH|SXTB|SXTB16|SXTH[] , BKPT|HLT|UDF ADC|ADD|AND|BIC|EOR|ORR|RSB|RSC|SBC|SUB[][S] ,, MOV|MVN[][S] , MOV[T|W][ ,# CMN|CMP|TEQ|TST[][S|P] , CLZ|RBIT|REV|REVSH|REV16[] , CRC32[C]<B|H|W> ,, QADD[8|16]|QSUB[8|16]|USAD8|USADA8|QDADD|QDSUB[] ,, UADD|UHADD|UQADD|UQSUB|USUB|SADD|SHADD|SSUB|SHSUB<8|16>[] ,, QASX|QSAX|UQASX|UQSAX|SHASX|SHSAX|SSAX|SASX|USAX|UASX|SEL[] ,, MUL[][S] ,, MLA|MLS|UMULL|UMLAL|SMULL|SMLAL[][S] ,,, UMAAL[] ,,, SMUL<W|B|T><B|T>[] ,, SMLA[L]<W|B|T><B|T>[] ,,, SM<LA[L]|LS[L]|UA|US>D[X][] ,,, SMM<LA|LS|UL>[R][] ,,, LDR|STR[][B|T|BT|SB|SBT|H|HT|SH|SHT|D] , '[ [,] '] [,][!] LDA|STL[][B|H] , '[ '] LDREX|LDAEX|STREX|STLEX[B|H|D][] , '[ '] LDM|STM[]DA|DB|EA|ED|FA|FD|IA|IB [!],{}[^] RFE<DA|DB|EA|ED|FA|FD|IA|IB> [!] SRS<DA|DB|EA|ED|FA|FD|IA|IB> SP[!],# SWP[][B] ,, '[ '] PLD[W]|PLI '[ [,] '] PKH<BT|TB>[] ,, PUSH|POP[] B[L][] BLX BX|BLX|BXJ[] SDIV|UDIV[] ,, WFE|WFI|SEV[L]|YIELD|ESB|CSDB[] DMB|DSB|ISB [SY|SYST|ST|LD|<ISH|NSH|OSH>[ST|LD]] TSB[] CSYNC SETEND <BE|LE> SETPAN # MRC|MCR[|2] ,,,, [,] MCRR|MRRC[] ,,,, CDP[|2] ,,,, [,] LDC|STC[|2][L] ,, '[ [,#] '] [,#|{expr}][!] CPS<ID|IE> <iflags[,#]> CPS # CLREX|ERET|SSBB|PSSBB MRS[] , MSR[] _[c][x][s][f],|# ADF|MUF|SUF|RSF|DVF|RDF|POW |RPW|RMF|FML|FDV|FRD|POL[][] ,, MVF|MNF|ABS|RND|SQT|LOG|LGN|EXP |SIN|COS|TAN|ASN|ACS|ATN|URD|NRM[][] , FLT[][] , FIX[][] , WFS|RFS|WFC|RFC[] CMF|CNF[E][] , LDF|STF[] , '[ [,#] '] [,#][!] LFM|SFM[] ,, '[ [,#] '] [,#][!] LFM|SFM[]EA|FD ,, '[ '] [!] DCF|EQUF OPT|=|DCB|EQUB|DCW|EQUW|DCD|EQUD|EQUS ADR[] , ALIGN|NOP where =|#|,ASL|LSL|LSR|ASR|ROR |#|RRX and =AL|CC|CS|EQ|GE|GT|HI|HS|LE|LS|LT|LO|MI|NE|NV|PL|VC|VS and =R0 to 15 or SP or LR or PC or and =CP0 to 15 or and =C0 to 15 or and =F0 to 7 or and =F0 to 7 or #, where =0,0.5,1,2,3,4,5 or 10 and =S|D|E|P and =P|M|Z and =CPSR|SPSR and =A|I|F

Simon-Willcocks commented 3 years ago

I've hacked (really hacked, you don't want to know!) together a test which will take a few hours to run through every possible instruction, and came across an oddity: the disassembler from binutils' objdump ignores (probably correctly) bits that are shown as (1) in an instruction description, so I'll have to do some post-processing on the test output. As long as there's one OK result for a given assembly instruction, it's OK.

grep 'stlxrb.*w8, w19, [x0]' /media/simon/OS/Other/All.txt 08088013 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 08088413 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 08088813 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 08088C13 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 08089013 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 08089413 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 08089813 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 08089C13 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808A013 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808A413 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808A813 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808AC13 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808B013 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808B413 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808B813 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808BC13 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808C013 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808C413 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808C813 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808CC13 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808D013 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808D413 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808D813 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808DC13 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808E013 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808E413 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808E813 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808EC13 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808F013 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808F413 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808F813 stlxrb w8, w19, [x0] -> 0808FC13 FAILED 0808FC13 stlxrb w8, w19, [x0] -> 0808FC13 OK

jgharston commented 3 years ago

On 04-06-2021 13:47, Simon-Willcocks wrote:

Oh, look. There are an awful lot of floating point and SIMD instructions...

383 variations, in fact. :(

Yes, that got me tearing my hair. Tootling along doing alu instructions, all common addressing modes, make a start on loads & stores, many common addressing modes.... ah! STR and LDR can change to be completely different instructions, but only after you've worked out what registers it is using. Grrrr. On any other platform they would be eg FLDR and FSTR or something.

I've made a start going through your (Simon's) code. I think most of mine can be thrown away, but I think I've got some optimisations.

I started with Richard's method of having a mnenionuc opcode lookup table for the base code for each instruction that is then modified by the specific instruction, so eg

case (various arithmetic): instruction=opcode[menemoic]<<24; ... if (immediate) instruction |= something

instead of case blah: instruction=foo case blah: instruction=foo case blah: instruction=foo case blah: instruction=foo etc.

You can get most instructions into an 8-bit byte, where you can't you add the extra bits in the instruction-specific code.

I also made the reg() routine return the register width in the top two bits, so then it can be OR'd striaght into the instruction, eg: r=reg(); instruction |= (r & BIT31) | (r & 31); // Rd comma(); r=reg(); instruction |= (r & BIT31) | ((r & 31)<<5); // Rn etc.

It has the side effect that an instruction is "promoted" to the biggest register used, eg:

add w1,w2,x3 is assembled as add x1,x2,x3

I hadn't yet got to the point of making a decision as to if this should be allowed, or be an assembly error. For the latter you'd collect the 'width' bits across the instruction and then fault if they all don't match. I think the maximum number of registers usable is four, so I think I'd set the top four bits from the reg() routine and do something like:

r=reg(); width |= (r & BIT31); instruction |= (r & BIT31) | (r & 31); // Rd comma(); r=reg(); width |= (r & BIT30); instruction |= (r & BIT31) | ((r & 31)<<5); // Rn comma(); r=reg(); width |= (r & BIT29); instruction |= (r & BIT31) | ((r & 31)<<10); // Rm ... if ((width == 0) || (width == (14<<28)) else error("Mixed register widths"); // %000xxx -> all 32-bit, %111xxx -> all 64-bit

-- J.G.Harston - @.*** - mdfs.net/jgh

Simon-Willcocks commented 3 years ago

I made a similar change, but I used bits 5 and 6, so you can say if (32 == reg_size( r )...

The automated test has thrown up several problems, usually having to do with me not being able to count bits. It works through every possible opcode in about 2 hours, but the disassembler apparently knows instructions I don't (saddv, anyone?), and the output files are a bit excessive.

-rwxrwxrwx 1 simon simon 79G Jun 6 16:29 /media/simon/OS/Other/All.txt -rwxrwxrwx 1 simon simon 43G Jun 6 16:29 /media/simon/OS/Other/Fail.txt -rwxrwxrwx 1 simon simon 36G Jun 6 16:29 /media/simon/OS/Other/Pass.txt

I'm having difficulty attaching the latest version, I'll try again in a minute.

Simon-Willcocks commented 3 years ago

Half the instructions that have a bit to say it's working on a 64-bit register have them at bit 30!

(Upload still not working, sorry!)

jeffpanici75 commented 3 years ago

Not sure this will be helpful to others but thought I'd mention it. ARM releases their architecture specifications as executable code: https://developer.arm.com/-/media/developer/products/architecture/armv8-a-architecture/A64_v82A_ISA_xml_00bet3.1.tar.gz

If you unpack this tarball there's a folder with an XML file for every valid instruction encoding. The XML includes ASL for both the decoding and execution of every instruction. Note that ASL is like any other programming language: it has shared code (libraries) that are found in other directories of this archive. Every aspect of the hardware is described this way. ARM uses this for automated verification but we could use it to automate building an assembler/disassembler. ARM has an example ASL interpreter: https://github.com/ARM-software/asl-interpreter

I'm sure there are other ASL interpreters written in more popular languages but I never bothered looking because I'm OK working with OCaml.

Anyhow, I was going to machine generate most (all?) of the assembler/disassembler this way.

Simon-Willcocks commented 3 years ago

Wow, I never expected something so powerful to be publically available! I might have a look at starting again and auto-generating the same kind of code...

On the other hand, testing is ongoing.

I worked out how to deal with encodings that don't match my encodings for the same assembly instruction: if the instruction I generate from the disassembly of the instruction we're looking it, I disassemble my attempt and compare the two assembly instructions. If they match, I count that as a "win" (labelled duplicate).

Apparently "adr xzr, address" is valid assembler and "adr sp, address" isn't. Strange.

Hey, uploads are working again! bbasmb_arm_64.c.zip

Simon-Willcocks commented 3 years ago

Does anyone know what on earth the difference is between:

ldrsb wzr, [sp, xzr, sxtx #0] and ldrsb wzr, [sp, xzr, sxtx]

(Other than the S bit has to be set in the former?)

" Is the index shift amount, it must be #0 , encoded in "S" as 0 if omitted, or as 1 if present." But what difference does it make?

rtrussell commented 3 years ago

But what difference does it make?

I'm guessing none. Is there some reason why you think there is, or should be, a difference?

Simon-Willcocks commented 3 years ago

There's a bit to say if it's there or not!

rtrussell commented 3 years ago

There's a bit to say if it's there or not!

Yes, but isn't this a case where the S bit is common across a number of different instructions, in most of which it does have an effect but in the case of this particular instruction variant it doesn't? Normally S indicates an optional shift of the index value: a shift of 4 bits (16) with a 128-bit load, 3 bits (8) with a 64-bit load, 2 bits (4) with a 32-bit load and 1 bit (2) with a 16-bit load. But when it's an 8-bit load, as in this case, there is no option to shift the index, even though the S bit is still present in the instruction encoding. Hence whether you set it or not, the shift is zero.

In the specific case of this instruction an assembler might typically accept either no shift value, in which case it would not set the S bit, or a shift of #0 (that being the only allowed value), in which case it would set the S bit. The effect would be identical, and setting the S bit indicative only of how the programmer coded the instruction. But more generally it would accept either no shift value (setting S to zero) or the unique allowed value of #0, #1, #2, #3 or #4 as appropriate (setting S to one).

Simon-Willcocks commented 3 years ago

Yes, it looks like it "only" hurts my testing.

Simon-Willcocks commented 3 years ago

If anyone's still paying attention, the latest version, as far as I can tell, handles all the basic instructions correctly. What's left are the FP/SIMD instructions, and instructions introduced since the version of the DDI0487 (C) I was working with until recently. For some reason, I didn't expect more instructions to be added to a processor!

At least, of the 1,533,460,215 instructions objdump thinks are valid, none of them are assembled to the wrong instruction. But there are nearly 750 instruction classes that aren't assembled, yet.

It occurs to me that, perhaps, FP and SIMD instructions are important to BASIC programmers trying to extract more speed from their programs? bbasmb_arm_64.c.zip

I don't know if I can upload my test program, because it involves GNU code, and if I release it, it might infect BBCSDL. (And it's pretty ugly!) It runs through every possible instruction in about ninety minutes.

rtrussell commented 3 years ago

It occurs to me that, perhaps, FP and SIMD instructions are important to BASIC programmers trying to extract more speed from their programs?

It's potentially the case, certainly, but since the 32-bit ARM assembler in BBCSDL doesn't currently support FP or SIMD instructions, anybody writing cross-platform code is going to be stuck anyway.

I'll try incorporating your code in an experimental build of BBCSDL; watch this space!

rtrussell commented 3 years ago

Preliminary results from trying to incorporate your code in the Console Edition build for the 'Apple Silicon' Mac (compiled using clang). It is issuing four warnings, not necessarily serious but best eliminated (a couple look like they might be an unwise assumption about the signedness of 'char', which varies between platforms): Screenshot 2021-06-14 at 22 36 25

Simon-Willcocks commented 3 years ago

You're right, I didn't spot the first two because the tests all use lower case. Just change the declaration of code to unsigned char *code. The third is a hangover from earlier, when the value was inverted (a mistake), the line can simply by deleted. The rest are because the variable is not set in the default case of the switch, which never returns anyway, and will go away by simply initialising only_32bit to 0.

I will watch this space more carefully, and press refresh from time to time! bbasmb_arm_64.c.zip

rtrussell commented 3 years ago

I will watch this space more carefully, and press refresh from time to time!

Great, thanks. Sadly your latest version is not compiling in clang, there are two warnings but more seriously it is reporting an implicit function declaration: Screenshot 2021-06-15 at 17 23 00

Simon-Willcocks commented 3 years ago

Ooh, my mistake! I used to have a function for SIMD instructions, then I thought I might as well just put it in the massive switch statement with the others, and I forgot to get rid of the other calls. Just a minute.

Simon-Willcocks commented 3 years ago

How about this? bbasmb_arm_64.c.zip

rtrussell commented 3 years ago

How about this?

That's much better. I'm still getting the two warnings (see screen shot below) which can probably be fixed by casts, but they shouldn't be affecting operation. I've noticed a couple of anomalies related to the displayed listing (compare the output from yours with the 64-bit x86 assembler, below):

If I enter [nop nothing is displayed but if instead I enter [nop: (with a trailing colon) it is. It looks like the listing is somehow being 'deferred' until the following instruction rather than being output immediately.
There's no space between the address and the opcodes, possibly because you're not allowing for an address with more than 32 bits (I would have expected it to be padded with leading zeros to 64-bits).

Simon-Willcocks commented 3 years ago

The warnings can probably be avoided by removing the "unsigned", that's just me being dumb.

I don't think my stuff has a lot to do with displaying the opcodes, and when I try it on console/linux, it shows a space, but nothing with just "[nop", strangely. I'll have a look.

Simon-Willcocks commented 3 years ago

It's probably because I changed:

                    case 0x0D:                                    
                            newlin () ;                      
                            if (*esi == 0x0D)                               
                                    break ;                                   
                    case ':':                                       
                            if (liston & BIT4)                
                                {

to: case 0x0D:
newlin () ;
if (esi == 0x0D || esi == '\0')
break ;
case ':':
if (liston & BIT4)
{

I can't remember why I did that.

Simon-Willcocks commented 3 years ago

It works better without the change.

rtrussell commented 3 years ago

The warnings can probably be avoided by removing the "unsigned"

Oh. Isn't the "unsigned" the one you added to eliminate the previous 'comparison always fails' warnings (caused by 'char' being equivalent to 'signed char')?

I don't think my stuff has a lot to do with displaying the opcodes

It's definitely behaving differently from the bbasmb_x86_64.c code from which you probably copied it.

and when I try it on console/linux, it shows a space

Are you sure that's not simply because the allocated memory happen to be in the bottom 4 Gbytes and therefore needs only a 32-bit address? You should be printing 16 hex digits.

Simon-Willcocks commented 3 years ago

"Oh. Isn't the "unsigned" the one you added to eliminate the previous 'comparison always fails' warnings (caused by 'char' being equivalent to 'signed char')?" Yes, but I'm sometimes stupid! It's got to be cast to the same type it's being compared to.

I copied from bbasmb_arm_32.c.

The sprintfs still have a space after the oldpc, I don't know what's going on there.

rtrussell commented 3 years ago

I copied from bbasmb_arm_32.c. The sprintfs still have a space after the oldpc, I don't know what's going on there.

Ah, if you copied the formatting of the listing from a 32-bit assembler it's bound to be wrong for a 64-bit assembler (twice as many hex digits in the address field!). bbasmb_x86_64.c has:

sprintf (t, "%016llX ", (long long) oldpc) ;

I can make the change if you would prefer me to.

Simon-Willcocks commented 3 years ago

I made the changes in this version. bbasmb_arm_64.c.zip

rtrussell / BBCSDL

AArch64 assembler #4