Closed lgblgblgb closed 3 years ago
Waiting for issue #32 to see the performance effect and other issues what full frame render -> scanline based emulation change may cause. It can help to decide how to emulate VIC-IV then in a more precise level.
Still waiting for other issue, #26 as it would be important to consolidate memory access (also accessed by VIC-IV) and I/O (so the CPU can access VIC-IV).
Now I would be able to move on, however it seems not it's worth to wait Mega65 to be "stabilized" ie 1080p mode will be used and possible other changes, so I would like to wait for the more-or-less final solution to be there in Mega-65 project itself.
MEGA65 palette handling is implemented (right now in the vic branch, will be merged back to dev + stable). It's now possible to use more than 4 bit per channel colour depth, also the 4 colour bank works. Renderers should be trained though to use sprite bank for sprites, and the alternative bank for the 16 bit char mode "use alternative bank".
Still TODO: implement ROM palette. On MEGA65 this would mean to fetch colours from bank 3. I'm not sure though if it's true for all (for sprites, ...) and for all selected banks. This should be done with more "quasi-banks" since always checking colour index is less than 16 is pricey at every pixel in renderers!
31c8be04e9f72fb92dec8be7fe55b6d83bfbc6c4
Palette handling in the dev branch now: 95c816880472ae3b85533a1c90769dd4dc6c93fd
Hernan has his own branch playing with VIC-IV. Probably I close this issue being too general. It does not mean it's resolved as whole, still, of course!
Re-opened with new meaning :)
Decided to do #209 first, since we need a more sane memory decoding subsystem first, to built on.
Branch hmw-mod is created; https://github.com/lgblgblgb/xemu/tree/hmw-mod 14d85d942edee3001dcd10321c1278431912cbd6 The plan: let's modify within that current state as much as possible to be convergent with the goals of the merge, and only merge at the "end" of this sub-project.
Current status: PAL/NTSC mode change only at new frame. TODO:
"merger" is now the new experimental branch to merge "a decent mainline Xemu" branch with VIC-IV things.
It seems emulation works to some extent with fusing things together (basically the next
branch under the name of merger
with the VIC-IV related - and now also other! - changes from hmw-mod
).
hmw
(in 40MHz mode is more/most/only? prominent). This is kinda important (not just because it's "slow" or something) since it means there is a problem hidden somewhere, since now basically the same code renders as the one in hmw
thus it should be very similar performance!And some unsorted list:
update_emulator
which forces updating the screen in this mode, however it won't work any more, as rendering now is not "everything at once" (but requires step by step), also, it involves a process to "open" and "close" frame (ie rendering target texture). And so on.-videostd
option to force PAL or NTSC mode from command line for testing purposes only!-videostd
option of course (if it was used at all), since it would force one, which is not possible to override then via the menu (that is, after using a menu, standard is not forced anymore)-fullborders
option from hmw
is restored (though does not work too much because of the problems with viewport settings)next
branchhmw
too: the double CPU speed problem :) However interestingly in 40MHz CPU mode there is still very much a difference in emulation CPU usage between hmw
and merger
now (also the real CPU usage reported by top
is kinda different than the one reported by Xemu).hmw-mod
first. I would guess the problem is because I separated actual pal/ntsc change and register write to do it. I have no idea how to deal with in a sane way :((make clean
first to be sure, then make DEBUG=yes
in targets/mega65
for both commands, to build a "debug capable version, which is handy then for gdb as having symbols and turns of optimizations which made hard/impossible to debug ... however it shouldn't be forgotten to make clean
before a normal build then, since a "debug capable" version is much more slow code)
DMA: 'hack' (preliminary!! support for new-style M65 DMA) status: **ENABLED**
DMA: initializing DMA engine for chip revision 1 (initially, may be modified later!), dyn_mode=YES(M65-aware), modulo_support=DISABLED.
UARTMON: disabled, no name is specified to bind to.
SPEED: fast clock is set to 40.00MHz, 2560 CPU cycles per scanline.
CPU[65CE02]: RESET, PC=0000, BCD_behaviour=NMOS-6502
SPEED: in_hypervisor=1 force_fast=0 c128_fast=0, c65_fast=0 m65_fast=0
[New Thread 0x7fffe5744700 (LWP 30490)]
AUDIO: initialized (#2), 44100 Hz, 2 channels, 1024 buffer sample size.
AUDIO: volume is set to 100%, stereo separation is 100% [component-A is 100, component-B is 0]
MEM: UNHANDLED memory policy: 0
ETH: not enabled by config/command line
AUDIO: start
VIC: switching video standard from <UNDEF> to PAL (1MHz line cycle count is 32.000000, frame time is 20000usec)
VIC: Write $005d SIDEBORDER/HOTREG: $c0
VIC4: 16bit=1, chrcount=40, charstep=40 bytes, charscale=120, vic_ii_first_raster=0, ras_src=0, border yt=112, yb=498, xl=94, xr=702, textxpos=80, textypos=98, screen_ram=$000400, charset/bitmap=$001000, sprite=$0007f8
VIC: compare raster is now 0
VIC4: 16bit=1, chrcount=40, charstep=40 bytes, charscale=120, vic_ii_first_raster=0, ras_src=0, border yt=104, yb=504, xl=94, xr=702, textxpos=80, textypos=104, screen_ram=$000400, charset/bitmap=$001000, sprite=$0007f8
VIC4: 16bit=1, chrcount=40, charstep=40 bytes, charscale=120, vic_ii_first_raster=0, ras_src=0, border yt=104, yb=504, xl=80, xr=719, textxpos=80, textypos=104, screen_ram=$000400, charset/bitmap=$001000, sprite=$0007f8
VIC: Write $0058 CHARSTEP: $50
VIC: Write $0059 CHARSTEP: $00
Thread 1 "xmega65.native" received signal SIGSEGV, Segmentation fault.
0x0000555555575c84 in vic4_render_scanline () at vic4.c:1263
1263 *(current_pixel++) = palette[REG_BORDER_COLOR];
(gdb) bt
#0 0x0000555555575c84 in vic4_render_scanline () at vic4.c:1263
#1 0x0000555555562958 in emulation_loop () at mega65.c:702
#2 0x0000555555562cd0 in main (argc=1, argv=0x7fffffffdee8) at mega65.c:801
(gdb) p palette
$1 = (Uint32 *) 0x555555fe0320 <vic_palettes+3072>
(gdb) p current_pixel
$2 = (Uint32 *) 0x7fffe6ffe004
(gdb) p vic_registers[0x20]
$3 = 0 '\000'
(gdb) p pixel_start
$4 = (Uint32 *) 0x7fffe6e29010
Overflow of texture on access. Subtracting the pointer value of pixel_current
from pixel_start
(according to gdb
, see above) the result is 1921012
. If we divide that number by 800*4
(800 pixel wide texture, and the 4 is because of having 4 bytes per pixel - RGBA
) gives 600.31625, which is over the height of the texture already, and also very clear that it's the problem as it's very near to the 600 pixel height! TODO why is it happening??
I assume this is a problem I introduced when I moved some PAL/NTSC related stuff into the time for opening a new frame, instead of the time of writing the register which causes the PAL/NTSC change. It seems this segfault always happening only at startup (and even then, not always) and not later, that's why I think it's related (though I haven't checked if PAL/NTSC change later can trigger this as well).
Annoyingly, the segfault disappeared now. This is very bad, since the typical situation is "hard to trigger" problem, which is still there just "hidden" by something. Still, I think, it's very important to find and fix the cause. Please note that hmw-mod
has this problem already! So maybe it's easier to debug there, as it's a much less "distance" from stock hmw
then merger
has now. By the way just guessing: can it have any connection with the TEXTYPOS patch
I backported to hmw-mod
(then to merger
as well)?
I could also catch rare segfault at exit, when I switched video standard just before. It also suggests that the problem is related to changing video standard, let it be the "initial" at startup, or later one ...
Ok, a more sane explanation, probably (that I can't express myself, even not at the n
th time ....
So the problem I think is based on the fact that PAL/NTSC change can be only interpreted at frame level. In original hmw
at writing the register $6F
causes to both to to do the "immediate" stuff (ie, calling "interpret legacy register" IIRC, etc), and to start a totally new frame. However this is not what we want, so I've split that into two "pieces" some remained at writing the register, however changing the behaviour of the rendering is deferred and handed in the next "open new frame" call. Probably that's the problem that it does not work anyway if those do not happen at the some time. But it should not happen the same time, so somehow this situation should be solved :-/
... that it's not a solution to also put (or only put) the interpret legacy stuff into the open new frame as well, since (AFAIK!) it's incorrect. If hot registers are enabled the change to effect VIC-IV registers should happen that time, though the effect on the rendering (another video mode) should be delayed for the new frame. Even if this workaround the problem (not sure). Since then there would be an unwanted interpret legacy register applying in the future unrelated on register write.
I think full_borders should expand the viewport to show MAX_RASTERS which is in fact what's happening beyond the scenes with full_borders = 0 or even in your CRT: every frame MAX_RASTERS scanlines are traced by the beam independent of your viewport (or monitor bezel cover ;) ).
I guess that SCREEN_HEIGHT #define has a definitely confusing name, and anyway we should not derive texture height from it but from max_rasters instead and setup the view port according to our needs.
If we dont want to re create SDL surface on mode change, I accept having a fixed 624-height surface as a trade off even if that means wasting ~300k of unused surface memory when NTSC is used.
OR, we setup texture recreation as before.
I dont know if there are alternatives.
What I can't see here, why we need MAX_RASTERS height. Since in case of CRT too, the max viewable rasters can be seen only, other scanline time are retrace, blanking, front-back "portch" (or how it should be spelled), and does needs to be BLACK as far as I am aware. thus we don't need those ever. Now with VGA more (not CRT, but can be a CRT VGA monitor as well, of course) MEGA65 uses vertical resolution of 480 for NTSC and 576 for PAL. and it means for the visible part. Since we can only show the visible part of course, like the monitor can show only the visible scanlines at all 480 or 576 pixels, 600 is even should be too much, but never too few.
My point here, that in case of (let's say) NTSC though we have 480 pixels "visible" VGA area, the total rasters are of course much higher as it includes blanking, retrace etc time as well. But those parts never can carry information, if you try that with real VGA, the monitor will lost sync and other odd things happens. As the mode itself says 480 pixels, you can't have more visible ones, even if "under the hood" there are more rasters but not used for carry visible pixel information but have other purposes, like retrace and blinking.
Surely, we must distinguish here between the notion of "visible scanline" from the viewpoint of the generated/emulated VGA signals, and what MEGA65 thinks (ie other than border). For example as far as I can imagine, for a V400 mode we have 400 pixels height worth of "useful information", some border, to sum to 480 pixels (in NTSC!) and even more that's needed for retrace/blinking. But since any monitor for real can display the "denoted" resolution of VGA modes, even if we have more pshysical scanlines those never can be displayed by monitors, in fact must be black voltage level on signaling, thus we don't need to even emulate that, since it does not carry useful output information. For a monitor (especially for CRT) those are needed to sync the beam, the only purpose, but for emulator those are totally unuseful and must be skipped but still be emulated as "skipping them" to have correct timing matches the emulated machine at least.
I'm not sure how well I could express myself, maybe not at all. Some exaimple, let's say for 640x480 VGA mode (yes, I am aware we don't use that at all, just an example): http://tinyvga.com/vga-timing/640x480@60Hz
As we can see, though the mode is 480 pixel height, we have 525 rasters! But the 525-480 rasters never can carry information so only 480 pixel height image can be seen by the user we can even call the mode "480 pixel height resolution thus 640 x 480"). Surely, the same applies for horizontal, that the measured in pixel time there, there are much more pixels in real but used again for blinking and horizontal retrace, and never carry real image information can be seen by humans.
Ifwe say that 600 pixel height texture is not enough in PAL, it's a problem as the max the visible scanlines cannot be larger than 576 (as that VGA mode is used ....) thus 600 should be enough, or even too much for our needs! This is not the same as "overscan" that's another notion, that not even the theoretically visible image area can be seen on TVs/monitors. a VGA monitor getting eg 640x480 should display 640 and 480 no overscan or anything.
Do we have any info, that measured in "physical" (VGA) raster, where is the beginning and ending position in PAL and NTSC "emulation" which are part of the visible image in terms of VGA signal? That region must be exactly 480 pixel height (for NTSC) or 576 (for PAL, IIRC!!!!!!). Then, I guess, we can say, that we can skip rendering anything, if VGA-raster is outside of the area where it's legal at all to emit any video signal (AFAIK trying to push anything on R/G/B pins of the VGA cable during retrace/blanking is a violation of the VGA standard, and in most times it would result monitors lose sync).
Surely, later, we can STILL apply some "narrowed" border if user wants (which then would trim vertical height even more but always within the visible area!), but in vertical direction at least, it's meaningless to even talk about any possible pixel data (even if just border!) outside of the visible raster lines.
In this way we can go away with texture of vertical resolution 576 (again, IIRC, I always forget, to be honest ...) as it must be MAX(PAL_VRES,NTSC_VRES)
and we can use viewport to really display just what we want (also maybe narrow down even more for reduced border, or provide "full border" in the terms of VGA signal max resolution). Though it would be fancy if we can track border setting of VIC-IV, since user may alter those to use all of the vertical resolution possible in a given mode, and it would be nice if Xemu detects that with changing viewport (if reduced border is used!) thus user can always see the "meaningful" content but not a huge border, when it's not used anyway by default.
This way, we can save CPU time, also memory.
I still can't understand two other problems:
merger
requires much more CPU than hmw-mod
(or hmw
, I guess). This is most prominent in 40MHz mode of course.Btw, a bit off-topic here, but there is now make GPROF=yes
(probably it needs make clean
first otherwise no rebuild done ...) to have profiling info what can be used then with gprof
to have some idea where the emulator spends the most time, etc. However this should be not so much connected here, the " merger
uses more CPU than hmw-mod
or hmw
" sounds like another issue more, this gprof is more for the future. But who knows.
@lgblgblgb thanks!! I will do some benchmarking later. After that I will adjust the renderer to just output the visible lines, not max_rasters as it is now, thus saving precious CPU time.
And btw, SCREEN_HEIGHT and SCREEN_WIDTH should be renamed to texture_height and width or something like that. The reason for these odd names, is the old scheme (and other emulations in Xemu as well) when simply the texture is the screen dimension always. Which is not so much the case with the MEGA65 emulation any more.
Previously 40MHz was assumed for MEGA65 fast clock. However it seems it's 40.5MHz, which is almost 2% of error. Let's use the correct value with the (really simple) commit above.
I guess soon we should have some kind of "open testing" model, ie provide builds and tell people that they can try, though it's known to have issues still. Just to have more feedbacks as possible catches of problems.
On the longer term (hopefully not too long!) I guess some issues should be addressed still (not only the ones I mentioned here already throughout this novel-sized issue already ...), like I worry a lot about the #230 (though even "regular" hmw
is affected but also next
- which is strange ...) because it was stated "but I haven't done any VIC-IV magic too much". But certainly even just not handling this bug, does not makes things worse, since it hasn't worked before either ;) As for new features added (like 16 colour nybl mode) I am a bit unsure, since if we start to add/test new things, we can't ever declare merger
as out to be the merger phase ... :)
In my opinion, this merger
thing at least should be as usuable and stable as hmw
even if it's not bullet-proof fully yet. My plan is to soon push next
to stable
(as there was stable
since ages ...) and merge merger
;) as the next sometime in the future, with the "game" we discussed before (proper commit owners for the respective author for the given code, etc, to honour the contribution factor fairly).
Hi @lgblgblgb , I agree with you that Nybl16 maybe should be integrated in other branch while we try to fix the remaining issues and achieve a stable code; as far as #230 goes, I think he doesnt do any VICIV magic but raster magic... and probably that's the source of the problem ; in fact I was going to check next how he's doing raster compare,etc.
Just realized: a bit funny that we agreed not to put nybl16 into merger
for now at least, but still we do ;) Anyway, I think, the change is relative straightforward and small and does not affect anything other, so now let's not turn back. It's another question why it does and did causes so many headache already, ehhh ;)
Just thinking to try out this "discussion" feature, here: #244
New happenings, see above:
@hernandp please check the 2. out, if you have time. Thanks a lot.
Auto build + deploy on travis has been placed back. Some mods/changes from next (as usual, to allow easier "final merge" in the future).
There are merger
builds now, even available on the downloading page (with some warnings ...): https://github.lgb.hu/xemu/
Above, the debug pixel read back plus cross-hair stuff as described in #256
@hernandp Please check the commit above, the "fix transparency at some places" one, also please read the commit massage (the full one, not just the first line). I tried my best to at least fix some "obvious at first sight" places of the SDL pixel value used to decide transparency kind of problem, and the 16 colour sprites transparency problem (vic4tests.prg now shows the moving diagonal line as it should be - IIRC). It's possible for sure that I made mistake and it shouldn't be done this way, it's always better to check mods with the author of the given code :) Thanks!
@hernandp I've found a problem in vic4.c, in regards of the "character set source". Namely, function get_charset_effective_addr()
and its caller. This is indeed, how C65 works, but not how MEGA65. In case of MEGA65, when VIC-II-style charset address specifies an address (together with some VIC-II bank), it's a well known fact on C64, that VIC-II will access the ROM instead of RAM, even if ROM is not banked in. However, MEGA65 does something very different. In case of MEGA65, a very separated memory entity is used, which is just for this very purpose. It's not even part of the main RAM (main_ram
, ie "fast RAM"). It's called "wom" (write-only-memory, since CPU can only write it - though VIC reads). At any case which would apply above for a C64, MEGA65 uses that WOM, which is initialized by Hyppo. Normally, from user point of view it does not make a difference, you can say. However the problem, that some programs (even some which was written by me!) directly manipulates the WOM, thus it would very much misbehave. I'm not sure how to remedy this in merger
. Since that function which calls get_charset_effective_addr()
is a bit "overloaded" to render many different modes, so it must be taken account exactly when this special piece of stuff used ... This "wom" is in memory_mapper.c
and called char_wom
array:
// Write-Only memory (WOM) for character fetch when it would be the ROM (on C64 eg)
Uint8 char_wom[0x2000];
Oh, and if you have time, please have a look on the commits I made to merger
, it would be too slow, to always discuss first, hopefully it's not a problem, sure me, or you can modify it if it's not so much a great idea what I do (and as we know, the "real" merge will be done by not using the merger
directly so it does not matter who commits it here at least, and how much messed up with multiple commits maybe hehe, this is a very serious experimental branch, indeed). Thanks.
I find commits to merger
perfectly reasonable.
Regarding WOM, yes I took ideas for -probably- your 65 code and how knows what other sources. You know character set address/ROMs are hell, at least for me. So yes, I agree we need to fix it and/or make test cases in the Shallan little program ;)
The commit above: trying to remedy the not using WOM issue (#263), also some out-of-bound case checking try
The commit above (b8e701ebd8382ee5ce7ed8bd530f35bbe32d6771) still has some potential problems. For example I am not sure:
About WOM though see #213 as well. Though indeed C65 allowed two ROM charsets (lower+upper/upper+gfx for each?) this is not supported even on MEGA65, AFAIK!!
I also tried the "K2 demo" (for C65) which uses bitplane modes, it seems to be OK. However also with my older VIC-IV implementation, that I have no idea where the sprite pointer bytes are in VIC-III bitplane mode ... This causes K2 demo cannot show the sprites correctly at the third (?) part. See #139 about this issue (there is screenshot there as well). Also, it's important to note that even merger
branch seems to be wrong here, as K2 demo renders garbage instead of letters on that sprite-part.
The last one for example moves the set of row_data_base_addr
outside of the char loop, since it shouldn't change meanwhile. And things like that, eg do not fetch char_byte
if not needed for the current mode, etc. Surely for true and deep optimization, crazy things can be done. We don't want that at this stage, but more the "light stuff" which still can be overseen more or less.
Meanwhile I've encountered many questions, like the horizontal flip can be done for mono/mcm bmm/char modes as well?
I also try to follow my idea to always "spam" the source with TODO
s and FIXME
s even if it's silly, just that we can remember later. Including the fact that even that comment is invalid and should be deleted without any action later ;)
The commit to fix #270 is really ugly please read here: https://github.com/lgblgblgb/xemu/commit/7c6b09f01f32681ea934f54b2ad438c31c9c5161#commitcomment-51797635
Note: it seems increasing the number of SIDs to four (originally for #277) triggered some race-conditions between the audio callback thread of SDL and the main thread of emulation. I try to "decorate" things with spinlocks to avoid it, hitting another strange bug meanwhile, may be workarounded ...
Well, it's over ...
Currently, M65 emulation has a VIC-III entity with only a slight modification. For real, VIC-IV should be handled as it is really implemented, ie some "compatibility hot registers" are only used to set VIC-IV registers up and basically using always the VIC-IV internals. Also some VIC-IV notions should be implemented like logical size, etc etc. Scaling stuff is another (and not so easy to answer) question. Maybe SDL should be directed with RenderCopy to do that dynamically, exploiting the GPU of the PC which runs the emulator?