65816 code outside bank 0

dwsJason commented 4 years ago

obj/lz4.o: asm/lz4.s:40: COMPUTE_PENDING_CALCULATIONS: Result (170791/$29b27) of a computation is out of 16-bit range. My expectation is that the assembler / linker will just take the lowest 16 bits of the address without complaining.

This is some self modifying code.

lda pDest,s
sta.w LZ4_Dst+1

LZ4_Dst LDY #$0000 ; Init Target unpacked Data offset

Maybe this isn't a bug, maybe I'm not using the assembler correctly. Any thoughts appreciated.

vhelin commented 4 years ago

My expectation is that the assembler / linker will just take the lowest 16 bits of the address without complaining.

Overflows give always an error. AND the bits you need e.g. using "& $FFFF". Sorry. :) Which of the lines is asm/lz4.s:40?

dwsJason commented 4 years ago

The sta.w LZ4_Dst+1, is line 40

Thanks so much for taking the time to consider this input.

If I don't have code in bank 0, then I need to actually AND my address field? I'm going to include the entire source file. "ASMCODE" address space is set to $20000, and has a size of $10000 (configured as a slot in the memory map)

Since the code does PHK/PLB, the bank is the same as the code. So the code that accesses addresses, can, and should use absolute addressing.

Having to add an &$FFFF to the actual address seems inconvenient. IORCA does not have this requirement, neither does Merlin, or x65, or the APW assembler, or the WDC assembler. So it does seem like a very non-standard requirement.

I'm hoping that I either misunderstand how to use your assembler, or that I did not adequately explain my issue.

`*

LZ4 Decompress by Brutal Deluxe!!!
.include common.i .include dp.s

mx %00 ; .export unpacklz4 ; .export LZ4_Unpack .slot "ASMCODE"

; .base $20 ; .org $200000

.section "ASMCODE"

*

int LZ4_Unpack(u8 pDest, u8 pPackedSource);

LZ4_Unpack:

; This syntax is not ideal .equ pDest 5 .equ pPackedSource 9

phb
phk
plb

sep #$20
lda pPackedSource+2,s    ; Pull out the src/dst banks
xba
lda pDest+2,s            ; Pull out the src/dst banks

rep #$31
tax                      ; Temp save in X

lda pDest,s
sta.w LZ4_Dst+1

lda pPackedSource+1,s    ; address of packed source + 4, is the unpacked len
sta.w upl+2

lda pPackedSource,s
adc #12
sta.w upl+1

upl lda.l >0 ; packed length adc #16 ; 16 bytes for packed buffer header adc pPackedSource,s ; start of packed buffer tay ; y has the pack data stop address

; 1st packed Byte offset
lda pPackedSource,s     ; skip 16 byte header on the source
adc #16
pha
txa
plx

jsr ASM_LZ4_Unpack
tay

; Copy the Return address
lda 1,s
sta pPackedSource,s
lda 3,s
sta pPackedSource+2,s

tsc
sec
sbc #-8
tcs
tya    ; return length  

plb
rtl

------------------------------------------------------------------------------- ASM_LZ4_Unpack STA.w LZ4_Literal_3+1 ; Uncompress a LZ4 Packed Data buffer (64 KB max) SEP #$20 ; A = Bank Src,Bank Dst STA.w LZ4_Match_5+1 ; X = Header Size = 1st Packed Byte offset STA.w LZ4_Match_5+2 ; Y = Pack Data Size XBA ; => Return in A the length of unpacked Data STA.w LZ4_ReadToken+3
STA.w LZ4_Match_1+3
STA.w LZ4_GetLength_1+3 REP #$30 STY.w LZ4_Limit+1 -- LZ4_Dst LDY #$0000 ; Init Target unpacked Data offset LZ4_ReadToken LDA.l >$AA0000,X ; Read Token Byte INX STA.w LZ4_Match_2+1 ---------------- LZ4_Literal AND #$00F0 ; >>> Process Literal Bytes <<< BEQ LZ4_Limit ; No Literal CMP #$00F0 BNE LZ4_Literal_1 JSR LZ4_GetLengthLit ; Compute Literal Length with next bytes BRA LZ4_Literal_2 LZ4_Literal_1 LSR A ; Literal Length use the 4 bit LSR A LSR A LSR A -- LZ4_Literal_2 DEC A ; Copy A+1 Bytes LZ4_Literal_3 MVN $AA,$BB ; Copy Literal Bytes from packed data buffer PHK ; X and Y are auto incremented PLB ---------------- LZ4_Limit CPX #$AAAA ; End Of Packed Data buffer ? BEQ LZ4_End ---------------- LZ4_Match TYA ; >>> Process Match Bytes <<< SEC LZ4_Match_1 SBC.l >$AA0000,X ; Match Offset INX INX STA.w LZ4_Match_4+1 -- LZ4_Match_2 LDA #$0000 ; Current Token Value AND #$000F CMP #$000F BNE LZ4_Match_3 JSR LZ4_GetLengthMat ; Compute Match Length with next bytes LZ4_Match_3 CLC ADC #$0003 ; Minimum Match Length is 4 (-1 for the MVN) -- PHX LZ4_Match_4 LDX #$AAAA ; Match Byte Offset LZ4_Match_5 MVN $BB,$BB ; Copy Match Bytes from unpacked data buffer PHK ; X and Y are auto incremented PLB PLX ---------------- BRA LZ4_ReadToken ---------------- LZ4_GetLengthLit LDA #$000F ; Compute Variable Length (Literal or Match) LZ4_GetLengthMat STA.w LZ4_GetLength_2+1 LZ4_GetLength_1 LDA.l >$AA0000,X ; Read Length Byte INX AND #$00FF CMP #$00FF BNE LZ4_GetLength_3 CLC LZ4_GetLength_2 ADC #$000F STA.w LZ4_GetLength_2+1 BRA LZ4_GetLength_1 LZ4_GetLength_3 ADC.w LZ4_GetLength_2+1 RTS ---------------- LZ4_End TYA ; A = Length of Unpack Data RTS ------------------------------------------------------------------------------- .ends ; Gotta close the section

`

vhelin commented 4 years ago

Hard to say anything about this. It's missing the includes

.include common.i .include dp.s

Without them I cannot assemble it... Could you try to simplify the case to something smaller? But by default, if WLA expects a 16-bit value, and you give it something bigger, you'll get an error. That's something I'd expect the assembler I use to do as silent ANDs sound dangerous and probable sources of errors. IORCA, Merlin, x65, APW or the WDC assembler must be more forgiving than WLA DX.

vhelin commented 4 years ago

I mean, if you try to do something like

.db 256+100

and expect that to produce 100 and the assembler is silent about the overflow, what are you doing? Does that make any sense at all? :D

I don't try to do

char c = 10000;

in ANSI C and hope it works. :D

vhelin commented 1 year ago

@Ramsis-SNES is the OP using WLA here somehow wrong, or what do you think? Should I allow truncation of 24bit values -> 16bit values perhaps via a command line switch?

Ramsis-SNES commented 1 year ago

I'm going to look into this later today. :-)

Ramsis-SNES commented 1 year ago

@Ramsis-SNES is the OP using WLA here somehow wrong, or what do you think? Should I allow truncation of 24bit values -> 16bit values perhaps via a command line switch?

I'm leaning towards a "no". Let me elaborate.

The LZ4 decompression routine provided by the OP (original source, intended for use on the Apple IIgs) uses self-modifying code. This means that the bank number will likely change depending on where in RAM the code is copied for execution. Because of this, it would make sense to write the code in such a way that it doesn't take the bank into account in the first place. So yeah, adding & $FFFF, or using indirect or even relative addressing modes, would be the most sensible approach. Alternatively, one could pre-define the offsets and use the defined labels instead of those within the code itself, which would become necessary anyway if the code was run from a different offset in RAM than where it sits in ROM.

Adding a switch to truncate 24-bit addresses to 16-bit addresses would possibly create more traps for young players (e.g., happily coding away and not knowing or losing track of which bank is accessed at any given time) than it would provide benefits for everyone. Also, where do we stop? Should we add another switch to truncate to 8-bit addresses next? ;-)

By the way:

Since the code does PHK/PLB, the bank is the same as the code. So the code that accesses addresses, can, and should use absolute addressing.

Having to add an &$FFFF to the actual address seems inconvenient. IORCA does not have this requirement, neither does Merlin, or x65, or the APW assembler, or the WDC assembler. So it does seem like a very non-standard requirement.

On page 22 of WDC's assembler/linker documentation, where it talks about absolute addresses like the OP specified, it is stated:

If the address is greater than FFFFH, then it assumes that it is a long absolute address.

So, while I don't know about the other assemblers mentioned, it seems at least doubtful that the WDC assembler would silently truncate absolute addresses to 16-bits. ;-)

vhelin commented 1 year ago

Thank you for your insight! Based on this I think I'll close this issue. I don't understand everything that is going on here so for me to fix an issue related to this, if there is an issue, should be something I can undestand. :)

It seems the OP wrote

sta.w LZ4_Dst+1

when LZ4_Dst was a 24-bit address/value? Maybe here we should have used

sta.l LZ4_Dst+1

instead?

vhelin commented 1 year ago

I have to add that I'm no 65816 programmer and 65816 is the platform I have least idea of what is going on there. I've added 65816 support to WLA DX with the help of people who actually know about 65816 programming. So all help is appreciated. :)

Ramsis-SNES commented 1 year ago

It seems the OP wrote
sta.w LZ4_Dst+1
when LZ4_Dst was a 24-bit address/value? Maybe here we should have used
sta.l LZ4_Dst+1
instead?

Yes, this might work if a) the implied/expected bank number ($02 in this case) stays the same after the code is copied to and run from RAM (which in turn might indeed be the case in a LoROM game), and b) the code isn't timing-sensitive (24-bit addressing takes more cycles than 16-bit addressing).

I have to add that I'm no 65816 programmer and 65816 is the platform I have least idea of what is going on there. I've added 65816 support to WLA DX with the help of people who actually know about 65816 programming.

Wow, I didn't know that. Regardless, I think WLA's 65816 assembler is as good as it gets! There simply is no alternative. And best of all, it's actively being developed! :D

vhelin commented 1 year ago

I have to add that I'm no 65816 programmer and 65816 is the platform I have least idea of what is going on there. I've added 65816 support to WLA DX with the help of people who actually know about 65816 programming.

Wow, I didn't know that. Regardless, I think WLA's 65816 assembler is as good as it gets! There simply is no alternative. And best of all, it's actively being developed! :D

LOL, thank you for this. :) I try to do my best, but I do need help once in a while from you guys. :)

vhelin / wla-dx

65816 code outside bank 0 #299