lgblgblgb / xemu

Emulations (running on Linux/Unix/Windows/macOS, utilizing SDL2) of some - mainly - 8 bit machines, including the Commodore LCD, Commodore 65, and the MEGA65 as well.
https://github.com/lgblgblgb/xemu/wiki
GNU General Public License v2.0
208 stars 32 forks source link

MEGA65: Xemu crashes on certain border-specific (seems to be) VIC-IV settings #301

Closed lgblgblgb closed 3 years ago

lgblgblgb commented 3 years ago

It seems there is some serious issue how Xemu/MEGA65 emulates border specific registers which causes Xemu to crash. Yesterday I accidentally did something which can be summarized with this BASIC65 command:

POKE $D05D,255

This causes Xemu to crash. Just today, a report from Endurion (thanks for reporting!) on Discord states something very similar, by this kind of code:

VIC4.SIDBDRWD = $d05c
VIC4.HOTREG   = $d05d
lda #20
jsr WaitFrame
; full border width
lda #$ff
sta VIC4.SIDBDRWD
lda VIC4.HOTREG
and #$c0
ora #$3f
sta VIC4.HOTREG

This and my observation seems to have a common cause, some problem on border calculation for some reason.

Discord link about my observation:

https://discord.com/channels/719326990221574164/781481205639020554/895720202128461855

Discord link with Endurion's observation:

https://discord.com/channels/719326990221574164/781481205639020554/895953188446961664

I mark this bug with the HIGH PRIORITY label as emulation should not crash, even if emulation is incorrect, or other issues meanwhile. Having a killer poke sounds like a fun, but it's not ;)

@hernandp if you have time, you can have a look on it (thanks!), for sure! ;)

hernandp commented 3 years ago

I will check it, thanks.

lgblgblgb commented 3 years ago

I have the idea that maybe some over/underflow happens by "extreme values" which causes then invalid memory access or something dealing with a pointer which can cause crash of Xemu then.

Comment from Endurion on Discord (https://discord.com/channels/719326990221574164/781481205639020554/896008163894116352):

"Great! Interestingly it works fine if I do ora #$07, which is enough for the screen to be fully covered. #$0f also kills it. When the left and right border overlap?"

I think this assumption makes sense. Probably the fix is just check of the situation when the borders would be "insane" already (overlap or something?) and internally use the value which is not under/over-flow yet.

I'm not sure I can deal with this right now, since I'm ill, and just well enough to have some random chat mostly to try to forget my problems :(

hernandp commented 3 years ago

Working on this, thanks.
I will report progress ASAP :)

hernandp commented 3 years ago

The problem is that border values goes completely off scale due to sideborder being in seemingly not very reasonable values. The crashing loop is:

        for (Uint32 *p = pixel_raster_start; p < pixel_raster_start + border_x_left; p++)
            *p = palette[REG_BORDER_COLOR];

Even from the point of view of the VHDL hardware spec, sideborder values exceeding 400 pixels do not make much sense as L/R borders are defined based on sideborder and display_width which is fixed at 800. I say 400 pixels because that's the limit from where the character generator is blocked by the sideborders completely. To say in other words, setting sideborder register to 399 will display a 2-pixel column in the middle off screen. 400 will cover the screen completely, which is expected since the display_width is 800. After that , any higher value seems not very useful or maybe considered as "unexpected behavior" as the left and right borders may be inverted.
Suppose a sideborder value of all bits to 1 (16383); this yields border_x_left = 16208 causing the pixel pointer address to go completely off-limits.

In the hardware, any value of the 13-bit Sideborder exceeding 400 will cause the border to cover everything. From this practical point of view I think that we can go with internally clamping sideborder calculation to maxout at 400, e.g in our calculation:

    border_x_right = FRAME_H_FRONT + TEXTURE_WIDTH - SINGLE_SIDE_BORDER;

we could state that (pseudocode) SINGLE_BORDER = SINGLE_BORDER > 400 ? 400 : SINGLE_BORDER

This could affect chargen which depends on SINGLE_BORDER but past 400 there would be no display of any char, I think.

I'm studying other possibilities and I also asked Paul why Sideborder has 13 bits if it works this way.

Regards, hernan

lgblgblgb commented 3 years ago

Or in other words, clamping at 400 thus TEXTURE_WIDTH / 2? Since you assume we have 800 pixels width texture which may be overridden (though it's probably not a good idea, but what happens if some modify vic4.h that).

By the way I am still not sure that TEXTURE_WIDTH is 800 just because to have MEGAphone emulated in the future, or is there other reason? As - IIRC - 720 pixels it should be. That's also the reason I would use constant derived values more, like TEXTURE_WIDTH / 2 instead of just a bare number 400.

Great work! Would you like to commit PR it? It would be best to test every possible values or to be sure no crash, but other than that I guess in this case it can go into next directly as a relative "short" change. Surely I can do that, but I don't want to take away "the honour" :)

Thanks!

hernandp commented 3 years ago

I will do. Of course I will clamp at TEXTURE_WIDTH/2, not magic numbers ;) Expect a PR from a next derived branch.

Thanks!

lgblgblgb commented 3 years ago

It's an interesting side-note here and I am really not sure if we may want to use texture width as 720 (at least until MEGAphone is emulated?). I am not sure if it wouldn't effect something but in theory it would decrease the needed memory operations and thus somewhat less CPU consuming by Xemu. Anyway, it's another story, and maybe not worth to mess things up now, just it came into my mind somehow.

hernandp commented 3 years ago

My inspiration was modeline definitions of the VHDL spec. All borders all calculated with that base. BUT since we are emulating and in the last word we like to offer the same experience independent from the internal details, that would be ok. But I dont know if it's worth, saved cycles/memory vs. effort? We can think about it, of course! No more crashes but still no PR because I figured out that we may get out-of-range accesses in chargen registers, since they are set adding up the sideborder, and optimally we would like to store any bizarre value as in hardware, but interpret it clamped to pixel buffer !. Also this triggered my idea to build a version of Xemu with AddressSanitizer enabled, I like to check if he's detecting any out-of-bounds array access.

lgblgblgb commented 3 years ago

Yeah, I have the suspect that it's room for other bugfixes in this regard ... Honestly I found that more useful in current situation at least to hunt for reported/detected bugs. But sure, using some stress-test (eg by also using random numbers to fill VIC-IV registers and allow it to run for hours ... seeing if crash sometimes, hehe ... it can even help to detect very odd issues which involves more registers at once to have a specific region or so to cause crash). But for sure, if you have time for that, it would be great what you also described with the address sanitizer!! Also, I shouldn't have mentioned the texture size shrink, it shouldn't count as much, and probably it's an issue for a time when we're so much bored that nothing other to do ;)

What I mean here with the "handle as reported" case, that just after I accidentally triggered this bug, it was also reported a day or so later, there without accident he wanted to play with the border, really! And it's kinda a show stopper more if a user really would like to use the code, compared to the case when it's discovered by accident, stress-test, etc but probably nobody hit that yet. True, it means it may be hit in the future! And still very important to remedy those in advance etc, better to fix the bug before discovered by users ;) Just I mean here the reality that with limited human resources, maybe better to focus on things already reported and users having problem with it already.

lgblgblgb commented 3 years ago

Though, again, really, if you have time for writing stress-tests, or anything surely, regardless of this or any other issue it would be very cool!!!!

hernandp commented 3 years ago

Though, again, really, if you have time for writing stress-tests, or anything surely, regardless of this or any other issue it would be very cool!!!!

Sure! (BTW, i successfully used AddressSanitizer to crash on pixel buffer overrun, so I will probably test this a bit more before commiting the new PR, thanks!)

lgblgblgb commented 3 years ago

Honestly I am not so good in this, I used only Valgrind several times for finding buffer overflows, memory leaks, or out-of-bound accesses some times ....

Maybe I should put those access checker stuff back, that if compiled in, tries to catch very invalid pixel accesses at least (if you remember there was such a thing back to then in vic4.c). Though for sure a proper tool is much more "fine in resolution" to actually catch the bug in time, indeed.

hernandp commented 3 years ago

The AddressSanitizer is GCC and CLang built-in now, it's enough to use -fsanitize=address both in CFLAGS and LDFLAGS :)

Of course all optimizations must be turned off , including frame pointer omission, et al.

hernandp commented 3 years ago

Documentation my friend, here: https://clang.llvm.org/docs/AddressSanitizer.html

lgblgblgb commented 3 years ago

Am I your friend, or the documentation should be my friend (ie RTFM) ;-P Take it easy, it wanted to be a joke :D But anyway, I had no idea about this, thanks for the information! I am afraid I am too oldschool my debug utility is to drop some printf here and there in the source and try then what it shows ;)

hernandp commented 3 years ago

Am I your friend, or the documentation should be my friend (ie RTFM) ;-P Take it easy, it wanted to be a joke :D But anyway, I had no idea about this, thanks for the information! I am afraid I am too oldschool my debug utility is to drop some printf here and there in the source and try then what it shows ;)

I always leave those kind of tools as a last resort. First Gdb + IDE debugger + printf.
In fact it was a loooong time ago that I used valgrind or this sanitizer thing, so we are in the same oldskool mindset.

Well, PR opened. Let's hope works well :)

lgblgblgb commented 3 years ago

Sorry I've tried to talk on Discord but maybe more appropriate to do it here, just one question:

The last if (vic4_sideborder_touched) part is by intent went outside the block it was originally part of? At least just looking the diff itself it seems. I just ask this since it seems to be an extra thing to do other than "just" the clamp of the border value to a "sane" one, the part above in the diff.

hernandp commented 3 years ago

It was a minor fix. For example if you boot up the machine and try to POKE $D05C,100 the sideborder will change, but NOT in Xemu since HOTREG is not touched. So it was different behavior. If you want, I can re do the PR with the sideborder thing and a separate one for this wrong behavior.

lgblgblgb commented 3 years ago

Noooo, it's OK, if it's ok to you. I just asked because it seemed for me an extra change, but if it's better this way, of course I believe you. Thanks!!!!!