lgblgblgb / xemu

Emulations (running on Linux/Unix/Windows/macOS, utilizing SDL2) of some - mainly - 8 bit machines, including the Commodore LCD, Commodore 65, and the MEGA65 as well.
https://github.com/lgblgblgb/xemu/wiki
GNU General Public License v2.0
208 stars 32 forks source link

MEGA65: integrate Hernan's ideas/work on VIC-IV enhancements with the possibility of re-factoring meanwhile #29

Closed lgblgblgb closed 3 years ago

lgblgblgb commented 8 years ago

Currently, M65 emulation has a VIC-III entity with only a slight modification. For real, VIC-IV should be handled as it is really implemented, ie some "compatibility hot registers" are only used to set VIC-IV registers up and basically using always the VIC-IV internals. Also some VIC-IV notions should be implemented like logical size, etc etc. Scaling stuff is another (and not so easy to answer) question. Maybe SDL should be directed with RenderCopy to do that dynamically, exploiting the GPU of the PC which runs the emulator?

lgblgblgb commented 8 years ago

Waiting for issue #32 to see the performance effect and other issues what full frame render -> scanline based emulation change may cause. It can help to decide how to emulate VIC-IV then in a more precise level.

lgblgblgb commented 8 years ago

Still waiting for other issue, #26 as it would be important to consolidate memory access (also accessed by VIC-IV) and I/O (so the CPU can access VIC-IV).

lgblgblgb commented 7 years ago

Now I would be able to move on, however it seems not it's worth to wait Mega65 to be "stabilized" ie 1080p mode will be used and possible other changes, so I would like to wait for the more-or-less final solution to be there in Mega-65 project itself.

lgblgblgb commented 4 years ago

MEGA65 palette handling is implemented (right now in the vic branch, will be merged back to dev + stable). It's now possible to use more than 4 bit per channel colour depth, also the 4 colour bank works. Renderers should be trained though to use sprite bank for sprites, and the alternative bank for the 16 bit char mode "use alternative bank".

Still TODO: implement ROM palette. On MEGA65 this would mean to fetch colours from bank 3. I'm not sure though if it's true for all (for sprites, ...) and for all selected banks. This should be done with more "quasi-banks" since always checking colour index is less than 16 is pricey at every pixel in renderers!

lgblgblgb commented 4 years ago

97 is basically this issue, so let's close that.

lgblgblgb commented 4 years ago

31c8be04e9f72fb92dec8be7fe55b6d83bfbc6c4

lgblgblgb commented 4 years ago

Palette handling in the dev branch now: 95c816880472ae3b85533a1c90769dd4dc6c93fd

lgblgblgb commented 4 years ago

Hernan has his own branch playing with VIC-IV. Probably I close this issue being too general. It does not mean it's resolved as whole, still, of course!

lgblgblgb commented 4 years ago

Re-opened with new meaning :)

lgblgblgb commented 3 years ago

Decided to do #209 first, since we need a more sane memory decoding subsystem first, to built on.

lgblgblgb commented 3 years ago

Branch hmw-mod is created; https://github.com/lgblgblgb/xemu/tree/hmw-mod 14d85d942edee3001dcd10321c1278431912cbd6 The plan: let's modify within that current state as much as possible to be convergent with the goals of the merge, and only merge at the "end" of this sub-project.

lgblgblgb commented 3 years ago

Current status: PAL/NTSC mode change only at new frame. TODO:

lgblgblgb commented 3 years ago

"merger" is now the new experimental branch to merge "a decent mainline Xemu" branch with VIC-IV things.

lgblgblgb commented 3 years ago

Great news, it's alive! ... kind of ...

It seems emulation works to some extent with fusing things together (basically the next branch under the name of merger with the VIC-IV related - and now also other! - changes from hmw-mod).

Problems remains though (aka. TODO) sorted by priority

  1. TOP PRIORITY: segfault after starting the emulator, but not always ... (see the next comment above, with details!)
  2. TOP PRIORITY: emulation host CPU usage for some reason is higher than it is with hmw (in 40MHz mode is more/most/only? prominent). This is kinda important (not just because it's "slow" or something) since it means there is a problem hidden somewhere, since now basically the same code renders as the one in hmw thus it should be very similar performance!

And some unsorted list:

DONE:

Testing materials

lgblgblgb commented 3 years ago

About the segfault:

(make clean first to be sure, then make DEBUG=yes in targets/mega65 for both commands, to build a "debug capable version, which is handy then for gdb as having symbols and turns of optimizations which made hard/impossible to debug ... however it shouldn't be forgotten to make clean before a normal build then, since a "debug capable" version is much more slow code)

DMA: 'hack' (preliminary!! support for new-style M65 DMA) status: **ENABLED**
DMA: initializing DMA engine for chip revision 1 (initially, may be modified later!), dyn_mode=YES(M65-aware), modulo_support=DISABLED.
UARTMON: disabled, no name is specified to bind to.
SPEED: fast clock is set to 40.00MHz, 2560 CPU cycles per scanline.
CPU[65CE02]: RESET, PC=0000, BCD_behaviour=NMOS-6502
SPEED: in_hypervisor=1 force_fast=0 c128_fast=0, c65_fast=0 m65_fast=0
[New Thread 0x7fffe5744700 (LWP 30490)]
AUDIO: initialized (#2), 44100 Hz, 2 channels, 1024 buffer sample size.
AUDIO: volume is set to 100%, stereo separation is 100% [component-A is 100, component-B is 0]
MEM: UNHANDLED memory policy: 0
ETH: not enabled by config/command line
AUDIO: start
VIC: switching video standard from <UNDEF> to PAL (1MHz line cycle count is 32.000000, frame time is 20000usec)
VIC: Write $005d SIDEBORDER/HOTREG: $c0
VIC4: 16bit=1, chrcount=40, charstep=40 bytes, charscale=120, vic_ii_first_raster=0, ras_src=0, border yt=112, yb=498, xl=94, xr=702, textxpos=80, textypos=98, screen_ram=$000400, charset/bitmap=$001000, sprite=$0007f8
VIC: compare raster is now 0
VIC4: 16bit=1, chrcount=40, charstep=40 bytes, charscale=120, vic_ii_first_raster=0, ras_src=0, border yt=104, yb=504, xl=94, xr=702, textxpos=80, textypos=104, screen_ram=$000400, charset/bitmap=$001000, sprite=$0007f8
VIC4: 16bit=1, chrcount=40, charstep=40 bytes, charscale=120, vic_ii_first_raster=0, ras_src=0, border yt=104, yb=504, xl=80, xr=719, textxpos=80, textypos=104, screen_ram=$000400, charset/bitmap=$001000, sprite=$0007f8
VIC: Write $0058 CHARSTEP: $50
VIC: Write $0059 CHARSTEP: $00

Thread 1 "xmega65.native" received signal SIGSEGV, Segmentation fault.
0x0000555555575c84 in vic4_render_scanline () at vic4.c:1263
1263                    *(current_pixel++) = palette[REG_BORDER_COLOR];
(gdb) bt
#0  0x0000555555575c84 in vic4_render_scanline () at vic4.c:1263
#1  0x0000555555562958 in emulation_loop () at mega65.c:702
#2  0x0000555555562cd0 in main (argc=1, argv=0x7fffffffdee8) at mega65.c:801
(gdb) p palette
$1 = (Uint32 *) 0x555555fe0320 <vic_palettes+3072>
(gdb) p current_pixel
$2 = (Uint32 *) 0x7fffe6ffe004
(gdb) p vic_registers[0x20]
$3 = 0 '\000'
(gdb) p pixel_start
$4 = (Uint32 *) 0x7fffe6e29010

My diagnosis:

Overflow of texture on access. Subtracting the pointer value of pixel_current from pixel_start (according to gdb, see above) the result is 1921012. If we divide that number by 800*4 (800 pixel wide texture, and the 4 is because of having 4 bytes per pixel - RGBA) gives 600.31625, which is over the height of the texture already, and also very clear that it's the problem as it's very near to the 600 pixel height! TODO why is it happening??

I assume this is a problem I introduced when I moved some PAL/NTSC related stuff into the time for opening a new frame, instead of the time of writing the register which causes the PAL/NTSC change. It seems this segfault always happening only at startup (and even then, not always) and not later, that's why I think it's related (though I haven't checked if PAL/NTSC change later can trigger this as well).

Annoyingly, the segfault disappeared now. This is very bad, since the typical situation is "hard to trigger" problem, which is still there just "hidden" by something. Still, I think, it's very important to find and fix the cause. Please note that hmw-mod has this problem already! So maybe it's easier to debug there, as it's a much less "distance" from stock hmw then merger has now. By the way just guessing: can it have any connection with the TEXTYPOS patch I backported to hmw-mod (then to merger as well)?

I could also catch rare segfault at exit, when I switched video standard just before. It also suggests that the problem is related to changing video standard, let it be the "initial" at startup, or later one ...

lgblgblgb commented 3 years ago

Explanation

Ok, a more sane explanation, probably (that I can't express myself, even not at the nth time ....

So the problem I think is based on the fact that PAL/NTSC change can be only interpreted at frame level. In original hmw at writing the register $6F causes to both to to do the "immediate" stuff (ie, calling "interpret legacy register" IIRC, etc), and to start a totally new frame. However this is not what we want, so I've split that into two "pieces" some remained at writing the register, however changing the behaviour of the rendering is deferred and handed in the next "open new frame" call. Probably that's the problem that it does not work anyway if those do not happen at the some time. But it should not happen the same time, so somehow this situation should be solved :-/

It's import to note ...

... that it's not a solution to also put (or only put) the interpret legacy stuff into the open new frame as well, since (AFAIK!) it's incorrect. If hot registers are enabled the change to effect VIC-IV registers should happen that time, though the effect on the rendering (another video mode) should be delayed for the new frame. Even if this workaround the problem (not sure). Since then there would be an unwanted interpret legacy register applying in the future unrelated on register write.

hernandp commented 3 years ago

I think full_borders should expand the viewport to show MAX_RASTERS which is in fact what's happening beyond the scenes with full_borders = 0 or even in your CRT: every frame MAX_RASTERS scanlines are traced by the beam independent of your viewport (or monitor bezel cover ;) ).

I guess that SCREEN_HEIGHT #define has a definitely confusing name, and anyway we should not derive texture height from it but from max_rasters instead and setup the view port according to our needs.

If we dont want to re create SDL surface on mode change, I accept having a fixed 624-height surface as a trade off even if that means wasting ~300k of unused surface memory when NTSC is used.
OR, we setup texture recreation as before. I dont know if there are alternatives.

lgblgblgb commented 3 years ago

What I can't see here, why we need MAX_RASTERS height. Since in case of CRT too, the max viewable rasters can be seen only, other scanline time are retrace, blanking, front-back "portch" (or how it should be spelled), and does needs to be BLACK as far as I am aware. thus we don't need those ever. Now with VGA more (not CRT, but can be a CRT VGA monitor as well, of course) MEGA65 uses vertical resolution of 480 for NTSC and 576 for PAL. and it means for the visible part. Since we can only show the visible part of course, like the monitor can show only the visible scanlines at all 480 or 576 pixels, 600 is even should be too much, but never too few.

My point here, that in case of (let's say) NTSC though we have 480 pixels "visible" VGA area, the total rasters are of course much higher as it includes blanking, retrace etc time as well. But those parts never can carry information, if you try that with real VGA, the monitor will lost sync and other odd things happens. As the mode itself says 480 pixels, you can't have more visible ones, even if "under the hood" there are more rasters but not used for carry visible pixel information but have other purposes, like retrace and blinking.

Surely, we must distinguish here between the notion of "visible scanline" from the viewpoint of the generated/emulated VGA signals, and what MEGA65 thinks (ie other than border). For example as far as I can imagine, for a V400 mode we have 400 pixels height worth of "useful information", some border, to sum to 480 pixels (in NTSC!) and even more that's needed for retrace/blinking. But since any monitor for real can display the "denoted" resolution of VGA modes, even if we have more pshysical scanlines those never can be displayed by monitors, in fact must be black voltage level on signaling, thus we don't need to even emulate that, since it does not carry useful output information. For a monitor (especially for CRT) those are needed to sync the beam, the only purpose, but for emulator those are totally unuseful and must be skipped but still be emulated as "skipping them" to have correct timing matches the emulated machine at least.

lgblgblgb commented 3 years ago

I'm not sure how well I could express myself, maybe not at all. Some exaimple, let's say for 640x480 VGA mode (yes, I am aware we don't use that at all, just an example): http://tinyvga.com/vga-timing/640x480@60Hz

As we can see, though the mode is 480 pixel height, we have 525 rasters! But the 525-480 rasters never can carry information so only 480 pixel height image can be seen by the user we can even call the mode "480 pixel height resolution thus 640 x 480"). Surely, the same applies for horizontal, that the measured in pixel time there, there are much more pixels in real but used again for blinking and horizontal retrace, and never carry real image information can be seen by humans.

Ifwe say that 600 pixel height texture is not enough in PAL, it's a problem as the max the visible scanlines cannot be larger than 576 (as that VGA mode is used ....) thus 600 should be enough, or even too much for our needs! This is not the same as "overscan" that's another notion, that not even the theoretically visible image area can be seen on TVs/monitors. a VGA monitor getting eg 640x480 should display 640 and 480 no overscan or anything.

lgblgblgb commented 3 years ago

Do we have any info, that measured in "physical" (VGA) raster, where is the beginning and ending position in PAL and NTSC "emulation" which are part of the visible image in terms of VGA signal? That region must be exactly 480 pixel height (for NTSC) or 576 (for PAL, IIRC!!!!!!). Then, I guess, we can say, that we can skip rendering anything, if VGA-raster is outside of the area where it's legal at all to emit any video signal (AFAIK trying to push anything on R/G/B pins of the VGA cable during retrace/blanking is a violation of the VGA standard, and in most times it would result monitors lose sync).

Surely, later, we can STILL apply some "narrowed" border if user wants (which then would trim vertical height even more but always within the visible area!), but in vertical direction at least, it's meaningless to even talk about any possible pixel data (even if just border!) outside of the visible raster lines.

In this way we can go away with texture of vertical resolution 576 (again, IIRC, I always forget, to be honest ...) as it must be MAX(PAL_VRES,NTSC_VRES) and we can use viewport to really display just what we want (also maybe narrow down even more for reduced border, or provide "full border" in the terms of VGA signal max resolution). Though it would be fancy if we can track border setting of VIC-IV, since user may alter those to use all of the vertical resolution possible in a given mode, and it would be nice if Xemu detects that with changing viewport (if reduced border is used!) thus user can always see the "meaningful" content but not a huge border, when it's not used anyway by default.

This way, we can save CPU time, also memory.

Another questions

I still can't understand two other problems:

  1. Why image "hops" when switch from PAL to PAL (... so no change), or sometimes when NTSC to NTSC, try with the context UI menu under Display -> Video standard -> PAL or NTSC, to select even just the current one.
  2. Why branch merger requires much more CPU than hmw-mod (or hmw, I guess). This is most prominent in 40MHz mode of course.
lgblgblgb commented 3 years ago

Btw, a bit off-topic here, but there is now make GPROF=yes (probably it needs make clean first otherwise no rebuild done ...) to have profiling info what can be used then with gprof to have some idea where the emulator spends the most time, etc. However this should be not so much connected here, the " merger uses more CPU than hmw-mod or hmw " sounds like another issue more, this gprof is more for the future. But who knows.

hernandp commented 3 years ago

@lgblgblgb thanks!! I will do some benchmarking later. After that I will adjust the renderer to just output the visible lines, not max_rasters as it is now, thus saving precious CPU time.

lgblgblgb commented 3 years ago

And btw, SCREEN_HEIGHT and SCREEN_WIDTH should be renamed to texture_height and width or something like that. The reason for these odd names, is the old scheme (and other emulations in Xemu as well) when simply the texture is the screen dimension always. Which is not so much the case with the MEGA65 emulation any more.

lgblgblgb commented 3 years ago

Previously 40MHz was assumed for MEGA65 fast clock. However it seems it's 40.5MHz, which is almost 2% of error. Let's use the correct value with the (really simple) commit above.

lgblgblgb commented 3 years ago

I guess soon we should have some kind of "open testing" model, ie provide builds and tell people that they can try, though it's known to have issues still. Just to have more feedbacks as possible catches of problems.

On the longer term (hopefully not too long!) I guess some issues should be addressed still (not only the ones I mentioned here already throughout this novel-sized issue already ...), like I worry a lot about the #230 (though even "regular" hmw is affected but also next - which is strange ...) because it was stated "but I haven't done any VIC-IV magic too much". But certainly even just not handling this bug, does not makes things worse, since it hasn't worked before either ;) As for new features added (like 16 colour nybl mode) I am a bit unsure, since if we start to add/test new things, we can't ever declare merger as out to be the merger phase ... :)

In my opinion, this merger thing at least should be as usuable and stable as hmw even if it's not bullet-proof fully yet. My plan is to soon push next to stable (as there was stable since ages ...) and merge merger ;) as the next sometime in the future, with the "game" we discussed before (proper commit owners for the respective author for the given code, etc, to honour the contribution factor fairly).

hernandp commented 3 years ago

Hi @lgblgblgb , I agree with you that Nybl16 maybe should be integrated in other branch while we try to fix the remaining issues and achieve a stable code; as far as #230 goes, I think he doesnt do any VICIV magic but raster magic... and probably that's the source of the problem ; in fact I was going to check next how he's doing raster compare,etc.

lgblgblgb commented 3 years ago

Just realized: a bit funny that we agreed not to put nybl16 into merger for now at least, but still we do ;) Anyway, I think, the change is relative straightforward and small and does not affect anything other, so now let's not turn back. It's another question why it does and did causes so many headache already, ehhh ;)

lgblgblgb commented 3 years ago

Just thinking to try out this "discussion" feature, here: #244

lgblgblgb commented 3 years ago

New happenings, see above:

  1. I've added the initial video mode workaround: ce5bf0642b00296d0250032118a0331be0cb20c3
  2. I've added a comment on an old commit, which is very suspect for me, maybe there are more cases like this? https://github.com/lgblgblgb/xemu/commit/f90fe874cf488d698e9722adc1cb62298bc6e5ff#r51122068

@hernandp please check the 2. out, if you have time. Thanks a lot.

lgblgblgb commented 3 years ago

Auto build + deploy on travis has been placed back. Some mods/changes from next (as usual, to allow easier "final merge" in the future).

lgblgblgb commented 3 years ago

There are merger builds now, even available on the downloading page (with some warnings ...): https://github.lgb.hu/xemu/

lgblgblgb commented 3 years ago

Above, the debug pixel read back plus cross-hair stuff as described in #256

lgblgblgb commented 3 years ago

@hernandp Please check the commit above, the "fix transparency at some places" one, also please read the commit massage (the full one, not just the first line). I tried my best to at least fix some "obvious at first sight" places of the SDL pixel value used to decide transparency kind of problem, and the 16 colour sprites transparency problem (vic4tests.prg now shows the moving diagonal line as it should be - IIRC). It's possible for sure that I made mistake and it shouldn't be done this way, it's always better to check mods with the author of the given code :) Thanks!

lgblgblgb commented 3 years ago

@hernandp I've found a problem in vic4.c, in regards of the "character set source". Namely, function get_charset_effective_addr() and its caller. This is indeed, how C65 works, but not how MEGA65. In case of MEGA65, when VIC-II-style charset address specifies an address (together with some VIC-II bank), it's a well known fact on C64, that VIC-II will access the ROM instead of RAM, even if ROM is not banked in. However, MEGA65 does something very different. In case of MEGA65, a very separated memory entity is used, which is just for this very purpose. It's not even part of the main RAM (main_ram, ie "fast RAM"). It's called "wom" (write-only-memory, since CPU can only write it - though VIC reads). At any case which would apply above for a C64, MEGA65 uses that WOM, which is initialized by Hyppo. Normally, from user point of view it does not make a difference, you can say. However the problem, that some programs (even some which was written by me!) directly manipulates the WOM, thus it would very much misbehave. I'm not sure how to remedy this in merger. Since that function which calls get_charset_effective_addr() is a bit "overloaded" to render many different modes, so it must be taken account exactly when this special piece of stuff used ... This "wom" is in memory_mapper.c and called char_wom array:

// Write-Only memory (WOM) for character fetch when it would be the ROM (on C64 eg)
Uint8 char_wom[0x2000];
lgblgblgb commented 3 years ago

Oh, and if you have time, please have a look on the commits I made to merger, it would be too slow, to always discuss first, hopefully it's not a problem, sure me, or you can modify it if it's not so much a great idea what I do (and as we know, the "real" merge will be done by not using the merger directly so it does not matter who commits it here at least, and how much messed up with multiple commits maybe hehe, this is a very serious experimental branch, indeed). Thanks.

hernandp commented 3 years ago

I find commits to merger perfectly reasonable.

Regarding WOM, yes I took ideas for -probably- your 65 code and how knows what other sources. You know character set address/ROMs are hell, at least for me. So yes, I agree we need to fix it and/or make test cases in the Shallan little program ;)

lgblgblgb commented 3 years ago

263 again, we need to take attention for possible out-of-bound accesses as well, see this comment there: https://github.com/lgblgblgb/xemu/issues/263#issuecomment-850326293

lgblgblgb commented 3 years ago

The commit above: trying to remedy the not using WOM issue (#263), also some out-of-bound case checking try

lgblgblgb commented 3 years ago

The commit above (b8e701ebd8382ee5ce7ed8bd530f35bbe32d6771) still has some potential problems. For example I am not sure:

lgblgblgb commented 3 years ago

About WOM though see #213 as well. Though indeed C65 allowed two ROM charsets (lower+upper/upper+gfx for each?) this is not supported even on MEGA65, AFAIK!!

lgblgblgb commented 3 years ago

I also tried the "K2 demo" (for C65) which uses bitplane modes, it seems to be OK. However also with my older VIC-IV implementation, that I have no idea where the sprite pointer bytes are in VIC-III bitplane mode ... This causes K2 demo cannot show the sprites correctly at the third (?) part. See #139 about this issue (there is screenshot there as well). Also, it's important to note that even merger branch seems to be wrong here, as K2 demo renders garbage instead of letters on that sprite-part.

lgblgblgb commented 3 years ago

The last one for example moves the set of row_data_base_addr outside of the char loop, since it shouldn't change meanwhile. And things like that, eg do not fetch char_byte if not needed for the current mode, etc. Surely for true and deep optimization, crazy things can be done. We don't want that at this stage, but more the "light stuff" which still can be overseen more or less.

Meanwhile I've encountered many questions, like the horizontal flip can be done for mono/mcm bmm/char modes as well?

I also try to follow my idea to always "spam" the source with TODOs and FIXMEs even if it's silly, just that we can remember later. Including the fact that even that comment is invalid and should be deleted without any action later ;)

lgblgblgb commented 3 years ago

244 discussion has been updated with the current situation as of now.

lgblgblgb commented 3 years ago

The commit to fix #270 is really ugly please read here: https://github.com/lgblgblgb/xemu/commit/7c6b09f01f32681ea934f54b2ad438c31c9c5161#commitcomment-51797635

lgblgblgb commented 3 years ago

Note: it seems increasing the number of SIDs to four (originally for #277) triggered some race-conditions between the audio callback thread of SDL and the main thread of emulation. I try to "decorate" things with spinlocks to avoid it, hitting another strange bug meanwhile, may be workarounded ...

lgblgblgb commented 3 years ago

280 should be fixed (problem with V400 bitplane mode). Also needs to be checked that no problem with other modes (not bitplane!) with V400, maybe this is a common problem, not restricted to bitplane mode only?

lgblgblgb commented 3 years ago

Well, it's over ...