Open robin-raymond opened 1 year ago
Actually I think this is the AFLI bug, i.e. when writing to $D011 this stalls the VIC's reads which causes the first 3 characters on every line to be unusual. In Hires mode the color becomes grey. I think the issue is that the emulator you use does not emulate this bug faithfully.
You're probably right, is there a way to fix the AFLI bug?
From everything I've read it's not possible on a stock C64 because during the "bad lines" the VIC is stalling out while the CPU is allowed to finish it's 3 cycles, so it's unable to read the real character data. The VIC does read the first character columns pixels in memory left over but not the related color attributes so the first leftmost character is a bit corrupted (solution is to set those pixels to #0 so they go to background color) and then the next two character columns are blanked (in hires FLI you get grey, in MCI you get the background color). Effectively you have a left sided 3 character wide penalty on stock 64s. Apparently the supercpu can help but I have no details on that. Obviously an emulator can "do the right thing" which could either be to fix the bad lines bug or to reproduce the stock bad lines bug exactly cycle for cycle (like VICE/CCS64 do).
On a related note, one thing that always puzzled me though with FLI is the scroll value. I would think that by increasing the scroll value to match the raster line's 3 least significant bit value (thus invoking the bad line) would cause a different side effect of reading the same character information information over and over again until the scroll value wraps back to 0 (which happens every 8 lines).
For example if on raster line 51 the scroll is set to %011 (neutral scroll, neither up nor down). The normal screen ram/color ram is read at the first byte array at the VIC's specified ram address. But when raster line 52 is reached, the scroll is set to %100 causing the screen to be pushed visible downward by exactly one pixel. Thus the line being shown on raster line 52 needs to be the characters from one line above. If the raster line 52 was pushed down exactly one pixel, I would expect a repeat of 51's character data to be displayed again on line 52 since that's what's set to be displayed on 52 with that scroll value. I can understand the screen ram working in FLI mode because the ram banks are switched per scan line thus reading "the same line" can result in different values, but the screen address is never changed in any sample I have seen. I would expect the visual result to be the same pixel pattern (except with different coloring) 8 times in a row. Clearly this is not what's happening because FLI works in practice, but I don't get the theory of why it works. I would get if it the character address was changing too but it's not.
I noticed the C64 hires export routine re-arranges the screen memory from the normal layout. Which I could understand it solving the issue if the scroll was offset slightly from normal, e.g. scroll pattern of 0,3,5,7,2,4, or something but that's not what the code does which is an +1 incrementing value on the scroll position (until the wrap point). I must be missing something vital / subtle and I can't find any details that explain why this isn't the case. Do you have any insight given you wrote the FLI example code?
Okay, an update on this particular bug. Correct, the issue is the "FLI" bug. This is also present in FLI Multi Color too.
I was rewriting the drawing routines because I discovered that FLI displays have a "bad line" pixel count of 32 pixels instead of the expected 24 pixels for a properly coded / timed FLI routine. This has to do with the entry point in the FLI loop being every so-slightly off, and thus the FLI bad lines extends a bit too far. In attempting to diagnose I ported code from https://codebase64.org/doku.php?id=base:fli_displayer to support the FLI outputs from the dithetron, to compare against an alternative implementation that "appears" to work. I wanted to find out the EXACT raster positions of the raster for code that works on a "real" system. But I have discovered a timing bug related to FLI / VIC-II in the (chips?) emulator in the process.
Now by "real system" I'm comparing against VICE/CCS65 which both have pretty accurate reproductions of the VIC-II (although not perfect but nearly). It's possible the code I ported isn't perfect either but I managed to get the proper 24 pixel count bad line instead of 36 with it. The ported code had a tiny issue related to which raster line "tweak" values to pick but this issue has been fixed although it's still possible there's a difference between a real PAL / NTSC system and the VICE/CCS65, and the (chips?) emulator too, or maybe the sample code. That needs to be verified.
Ironically, using the 8-bit workshop (chips?) emulator would make fixing / exploring the raster timing super easy, if it was accurate. VICE is clumsy as the raster debugging isn't very interactive and it's extremely cumbersome, although I think it's accurate in this area.
I have code prepared to help deal with dithering and the "FLI" bug but before I submit it, I want to diagnose the code timing issue fully. I need to get the bad line pixels to 24 pixels wide from 36 wide.
If anyone has good sample working FLI players (with asm source) that I can compare against that would be helpful.
Here's the code I ported for testing (needs some clean up but it functions):
processor 6502
include "basicheader.dasm"
; credit to https://codebase64.org/doku.php?id=base:fli_displayer
Use8BitWorkshopEmulator equ 1
UseInitTables equ 0
#if Use8BitWorkshopEmulator
TweakD018 equ -1
TweakD011 equ 7
#else
TweakD018 equ 1
TweakD011 equ 1
#endif
Irq0AtRaster equ $2d
; temporary CopyMem storage variables in
; zero page
Src equ $02
Dest equ $04
Sys2062:
jmp Start ; entry point from basic
;-------------------------------------------------
; Start of code that must be within the
; same page boundary $nn00 -> $nnFF
; otherwise some instructions may become
; cycle inaccurate.
.align $100
.align $1
;
; Two IRQs are used to create a stable raster
; line start point free from issues caused by
; interrupts, inconsistent mid-instruction
; triggers, or other concerns.
;
; The first IRQ's job is to setup the second IRQ.
; While the first IRQ is triggers based on a
; raster line it's timing is not said to be as
; accurate becuase the CPU might be processing
; any possible cycle timed 1-7 clock cycle
; instructions, whereas the second IRQ is
; triggered only during a 2 clock cycle "nop"
; instruction ensuring the second IRQ is accurate
; within 0 or 1 clock cycle count.
;
; The second IRQ further has logic to detect this
; 0 or 1 clock cycle count offset and correct the
; timing so the entry point into the raster
; routine is 100% accurate creating an accurate
; and stable raster-timed loop.
;
Irq0:
pha
lda $d019
sta $d019
inc $d012
lda #<Irq1
sta $fffe ; set up 2nd IRQ to get a stable IRQ
cli
;
; These "nop"s are not an accident, or in need
; of optimization. They allow the 2nd IRQ
; to be triggered with an off-by 0 or 1 clock
; cycle delay resulting in an "almost" stable IRQ.
;
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
; The "rti" of the first Irq0 is not needed as
; these "nop" instructions never fall-through.
; The stack is re-arranged so that the second Irq1
; (which triggers while the first Irq0 is being
; serviced) returns to the interrupt point where
; the first trigger IRQ happened bypassing the
; need for a "rti" from the first Irq0 entirely.
Irq1:
Ntsc1:
; PAL raster at 9 or 10/46
lda #$ea ; modified to NOP NOP on NTSC
lda #$80
sta $d018 ; setup first color RAM address early
lda #$38
sta $d011 ; setup first DMA access early
pla
pla
pla
lda $d019
sta $d019
lda #Irq0AtRaster
sta $d012
lda #<Irq0
sta $fffe ; switch IRQ back to first stabilizer IRQ
lda $d012 ; PAL raster at 55 or 56/46
cmp $d012 ; stabilize last jittering cycle
beq Delay ; PAL raster at 0 or 1/47; if equal, 2 cycles delay. else 3 cycles delay
Delay:
stx SaveX+1 ; PAL raster stable at 3/47 (no more fluctuations)
ldx #$0d
Wait:
dex
bne Wait
Ntsc2:
; PAL raster at 10/48
lda #$ea ; modified to NOP NOP on NTSC
Ntsc3:
lda #$ea ; modified to NOP NOP on NTSC
;
; Following here is the main FLI loop which forces
; the VIC-II to read new color data each
; rasterline. The loop is exactly 23 clock cycles
; long so together with 40 cycles of color DMA this
; will result in the 63 clock cycles which is exactly
; the length of a PAL C64 rasterline.
;
nop
nop
L0:
; PAL raster at 61/48, 61/49, 61/50, ...
lda LookupD018+TweakD018,x
sta $d018 ; set new color RAM address
lda LookupD011+TweakD011,x
sta $d011 ; force new color DMA
inx ; FLI bug $D800 color = 8 (orange)
cpx #199 ; last rasterline?
Ntsc4:
bne L0 ; branches to l0-1 on NTSC for 2 extra cycles per rasterline
; lda $d016
; eor #$01 ; IFLI: 1 hires pixel shift every 2nd frame
; sta $d016
; lda $dd00
; eor #$02 ; IFLI: flip between banks $4000 and $C000 every frame
; sta $dd00
SaveX:
ldx #$00
pla
Nmi:
rti
;
; End of code that must be within the
; same page boundary $nn00 -> $nnFF
; otherwise some instructions may become
; cycle inaccurate.
;-------------------------------------------------
Start:
sei
jsr CopyData
jsr InitGfx
jsr InitTables
jsr NtscFix
; Patch the table as the last line needs to
; perform the "open borders" trick. This trick
; involves an undocumented "feature" where multi
; color mode graphics is enabled with extended
; background mode. While documented as not a
; legal combination, this combination causes the
; borders to be open to writing during the
; raster scroll process (otherwise some of the
; rows would be shifted an "off"). This patching
; needs to be done within the timing of the final
; scan line otherwise the normal background is
; disturbed and the drawing is not correct. The
; screen needs to be turned off to ensure the
; background is painted during the final scene.
; Unfortunately the final row is cut-off
; for a 319 instead of 320 pixel count height.
; A fix is welcomed for this issue.
lda LookupD011+199
and #$07
ora #$70
sta LookupD011+199
; The VIC chip doesn't care if ram or rom is
; selected (with an exception), but the IRQs
; cannot be overridden later unless ram is loaded.
; Thus the kernal routines are not available while
; the picture is being displayed, and if the
; kernal rom is to be used, the IRQs must first be
; uninstalled prior to accessing the kernal
; functions and rom restored.
lda #$35 ; %x01: RAM visible at $A000-$BFFF and $E000-$FFFF.
; %1xx: I/O area visible at $D000-$DFFF. (Except for the value %100, see above.)
sta $01 ; disable ROMs %xxxxx101 (rest are default values)
lda #$7f
sta $dc0d ; no CIA #1 timer IRQs
lda $dc0d ; clear CIA #1 timer IRQ flags
lda #$2b
sta $d011 ; %00101011 - neutral scroll, 25 rows, screen off, bitmap mode, raster IRQ high bit zero
lda #Irq0AtRaster
sta $d012 ; interrupt at raster line 45
; Even though these IRQ values overrite screen
; color choice area of the picture data, this
; does not affect the picture in any way
; because the color choices end at 1000 bytes,
; not 1024 bytes leaving the extra few bytes
; unused by the VIC chip, which is fortunately
; exactly where IRQ vectors need to be installed.
;
; However, care must be taken that if a new
; picture is loaded into this memory area then the
; IRQ table needs to be re-initialzed to these
; default values and interrupts (including NMIs)
; must be disabled during the picture copying
; process. NMIs cannot technically be disabled,
; but a trick can be used where a NMI can be
; intentionally triggered without acknowledgement
; thus preventing a second NMI from happening.
lda #<Nmi
sta $fffa
lda #>Nmi
sta $fffb ; dummy NMI to avoid crashing due to RESTORE
lda #<Irq0
sta $fffe
lda #>Irq0
sta $ffff ; Irq0 is the default interrupt handler
lda #$01
sta $d01a ; enable raster IRQs (no other IRQs)
; dec op reads the value, writes the value back
; "as is" unmodified, then writes the value back
; modified guaranteeing bit 0 is cleared
dec $d019 ; clear raster IRQ flag (so it can trigger)
cli
jmp * ; that's it, no more action needed
CopyData:
; The VIC always reads the bitmap and screen color
; choices from RAM regardless if the ram or roms
; are active (with the exception of %xxxxx0xx and
; the exception to the exception being %xxxxx000).
; The color block data always is read from
; I/O $d800 area.
; %x00: RAM visible in all three areas.
; %x00: RAM visible in all three areas.
lda #$30 ; %00110000
sta $01 ; enable HIMEM RAM
; copy char memory
lda #<CharData
sta Src
lda #>CharData
sta Src+1
lda #0
sta Dest
lda #$c0
sta Dest+1
ldx #$20
jsr CopyMem
; copy screen memory
lda #<ScreenData
sta Src
lda #>ScreenData
sta Src+1
lda #0
sta Dest
lda #$e0
sta Dest+1
ldx #$20
jsr CopyMem
lda #$07 ; %x11: BASIC ROM visible at $A000-$BFFF; KERNAL ROM visible at $E000-$FFFF.
; %1xx: I/O area visible at $D000-$DFFF.
sta $01 ; enable ROM and $D000 I/O
; copy color block RAM to the VIC's color block area
lda #<ColorData
sta Src
lda #>ColorData
sta Src+1
lda #$d8
sta Dest+1
ldx #4
jsr CopyMem
rts
InitGfx:
lda #$00
sta $d015 ; disable sprites
lda XtraData+1
sta $d020 ; border
lda XtraData+0
sta $d021 ; background
lda #$18
sta $d016 ; %00011000 ; no horizontal scroll, 40 columns, multimode on
lda #$80
sta $d018 ; %10000000 ; bitmap data %0xx, 0: +$0000-$1FFF, 0-8191; screen color choices +$2000-$23FF, 8192-9215.
lda #$00
sta $dd00 ; %00, 0: Bank #3, $C000-$FFFF, 49152-65535.
rts
; The InitTables routine can be removed if your
; assembler supports a .repeat-style macro.
; The code is only included as an example of how
; to initialize the tables in the event your
; assembler does not have a suitable substitute.
InitTables:
#if UseInitTables
ldx #$00
L2:
txa
asl
asl
asl
asl
and #$70 ; color RAMs at $E000
ora #$80 ; bitmap data at $C000
sta LookupD018,x ; calculate $D018 table
txa
and #$07
ora #$38 ; bitmap
sta LookupD011,x ; calculate $D011 table
inx
bne L2
#endif
rts
NtscFix:
bit $d011
bmi *-3
bit $d011 ; wait for rasterline 256
bpl *-3
lda #$00
Test:
cmp $d012
bcs Nt
lda $d012 ; get rasterline low byte
Nt:
bit $d011
bmi Test
cmp #$20 ; PAL: $37, NTSC: $05 or $06
bcs Pal
;
; This code self-patches to support NTSC mode
; which means this code must be copied to RAM
; if the code is originally located in ROM.
; If this code must run from ROM then the code
; needs to be duplicated with a PAL and an
; NTSC version where the test routine installs
; one or the other versions for usage.
;
;
; The value "#$ea" as a literal is the op
; code for "nop", so when the instruction
; "lda #$ea" is patched, it becomes the values
; "$ea $ea" (i.e. "nop" and "nop").
;
; In such a patch, the clock cycle count
; changes from a 2-clock cycle "lda" immediate
; mode instruction into a 4-clock cycle timed
; instructions
;
lda #$ea
sta Ntsc1
sta Ntsc2
sta Ntsc3
dec Ntsc4+1
Pal:
rts
; copy data from Src to Dest
; X = number of bytes * 256 bytes at a time
CopyMem:
ldy #0
.Loop:
lda (Src),y
sta (Dest),y
iny
bne .Loop
inc Src+1
inc Dest+1
dex
bne .Loop
rts
.align $100
; lookup table for $d011
LookupD011:
#if UseInitTables
.ds 256
#else
.repeat 256/8
.byte $38,$39,$3a,$3b,$3c,$3d,$3e,$3f
.repend
#endif
; lookup table for $d018
LookupD018:
#if UseInitTables
.ds 256
#else
.repeat 256/8
.byte $80,$90,$a0,$b0,$c0,$d0,$e0,$f0
.repend
#endif
.align $100
CharData equ .
ScreenData equ CharData+8000
ColorData equ ScreenData+$2000
XtraData equ ColorData+1000
; link a demo picture
incbin "parrot-c64.multi.fli.bin"
@sehugg Above is the proper flicker-free auto-detect PAL/NTSC FLI routine using stable IRQs, and it maximizes the picture to nearly display the entire image, minus the final pixel row (which becomes blanked). The FLI bug exists (as it's not possible to fix), which is the correct 24 pixels wide (reduced from the current workshop's sample asm code which produces a 32 pixel wide "fli bug"). I have verified the code in VICE PAL and VICE NTSC, and validated the raster timings in VICE are 100% stable.
Unfortunately the 8bitworkship emulator has raster VIC timing differences to VICE for this sample, and I'm not sure which emulator is correct but if I had to bet, it's VICE that is correct (especially since it mirrors CCS65 behavior too). The next step is to use a real C64 PAL/NTSC to verify the code. I'll get around to it at some point if someone doesn't take up the task sooner. Obviously proving VIC timing correctness in VICE as compared to a real C64 is important prior to "fixing" the workshop emulator.
I tweaked the above code to display "something" in the workshop emulator when "Use8BitWorkshopEmulator equ 1" but this code does not work in VICE. Whereas setting "Use8BitWorkshopEmulator equ 0" does work properly in VICE.
Why this matters: The current example code built-into the dithertron tools has a 32 pixel wide FLI bug which is not good. I could not repair that code until I had a good proven stable reference. The dithertron sample is obviously a lot simpler code than above (but it's not as "proper" as above). Still, I'm going to try to fix the current sample on VICE since now I have this reference code to compare the exact raster timings against which I now know works 100% correctly (at least in VICE PAL/NTSC).
Assuming I can get the multi color FLI working, then I can shift my attention to the Hires FLI bug (which should be pretty much the same code with some tweaks).
@sehugg Okay, so I can NOT make the "simple" sample code work 100% as good as it can be because the complicated code is needed to make it work right. The issue is that the sample code needs to enter that loop at exactly raster cycle 61 in the row 48 prior to the row about to be displayed. Because the "simple" sample does a "bne" until the line number is correct, the raster cycle going into the routine is +/- off by as much as the time it takes to execute the "bne" instruction. That's a wide variation. The reason why it works at all is because the cycle count is kind of "way" off, so it does a "bad line" in the middle of a line, which causes that particular line to get a bit trashed (but it's not displayed so that's okay) then settles in it's own little clock cycle count for the rest of the loop. That cycle count "works" to display things, but it's not accurate enough to get the FLI bug down from 32 pixels wide to 24 pixels wide.
The double IRQ technique in the above sample does solve the issue, but it's definitely more complicated (although I did document it as best I could to explain what's happening). The trouble is the (chips?) workshop emulator you are using appears to not VIC cycle accurate (at least if you assume VICE and CCS65 are correct). I truly think it's off because I can't seem to make the raster position completely stable in the workshop emulator but I can in VICE. I would have to prove it with real hardware, but I have to do a few "repairs" on my PAL motherboard before I can do that testing.
So question: Do we use the simple, but somewhat broken sample in the real world, or do we use the proper implementation which is more complex, cycle accurate, PAL + NTSC compatible, and accurate on VICE/CCS65, but has an "#ifdef" to make it work with the workshop emulator (where the workshop emulator also glitches a small amount, and doesn't represent the FLI bug properly)? It's you call.
Ultimately this sample code should be used to improve the workshop emulator but that's beyond what I'm willing to sink my teeth into right now (at least beyond submitting a bug report [to whom?]).
Funny, I was just messing around with similar C64 raster timing stuff today ;) And yeah, I realize there are no shortcuts, at least none that we know about yet. I was just happy to get something working back then, but it's great you've taken it to the next level.
I think the code sample should be updated to reflect the reality of the actual hardware -- people may download the ROM use on real hardware, or maybe one day 8bitworkshop will support VICE (or fix the chips bug!)
I wish the #ifdef wasn't needed, but I think as a stopgap its fine and anyone reading the code will understand why.
This is cool!
Ok, I'll do a PR to fix tomorrow. 👍 Thanks for responding so fast and for making an extremely thoughtful decision!
I think you can close this bug. Instead a new bug should be opened related to the issues with the emulator (which impact FLI) and this issue can be linked for context. Up to you of course!
Also I noticed these horizontal lines at the bottom when exporting to 8bitworkshop, do you know what's going on? They seem to be in both Multi and Hires FLI.
Ah, I see your comment about the open borders trick. Must be another chips bug. I can make it go away by commenting out the open borders patch and also extending the loop by 2 lines (?) although I do get the last line duplicated. Weird.
Yes, it's another chips bug. Yes, you can tweak it for a nicer work around but at some other tradeoffs, like eating more pixel rows to off-screen display. The code was already long so I didn't want to overly complicate with too many #ifdef options, but up to you!
Have you tested if the C64 Hires FLI code works on real PAL hardware? On your emulator this example works properly. But On VICE and CCS65 the X offset is corrupted. I know raster / bad line / code timings are sensitive so while I track down what is the issue within the emulator or VICE/CCS64, I wanted to first see what is known (or not known) to work related to this sample code.
PS. I submitted a bug fix related to C64 Multi mode. The scroll offset register was defaulted incorrectly