mamedev / mame

MAME
https://www.mamedev.org/
Other
8.02k stars 1.99k forks source link

[render.cpp] Unscaled font rendering 3x slower than scaled. #6886

Open oomek opened 4 years ago

oomek commented 4 years ago

I've been trying to figure out why this scaler-free part of the code: https://github.com/mamedev/mame/blob/344b8c8558fa6204bde0bc34facea0b07ca7e151/src/emu/render.cpp#L423

is 3x slower than scaled: https://github.com/mamedev/mame/blob/344b8c8558fa6204bde0bc34facea0b07ca7e151/src/emu/render.cpp#L436

When the font glyphs are rendered without a scaler it takes 3x more performance hit than when scaled glyph are drawn. Does it have anythinghing to do with the seqid not being reused, but incremented on each frame infinitely up until overflow? The worst hit has Opengl backend, then d3d and bgfx. There is no performance penalty on gdi surprisingly.

To reproduce launch mame with the following switches on Windows: mame64.exe snowbros -window -nowaitvsync -nothrottle -video opengl -maximize -uifont tahoma mame64.exe snowbros -window -nowaitvsync -nothrottle -video opengl -maximize -uifont invalidname Then I press F11, and Tab and I note the emulation speed.

Here are the results:

Around 900% when Tahoma or other ttf font is used and rendered with scaling: image

Around 300% when invalid font name is provided forcing the renderer to use baked in font which is drawn without scaling: image

This applies to all platforms. Tried on Windows, Linux, Raspberry PI.

MooglyGuy commented 4 years ago

You're almost certainly correct regarding seqid. Since it gets advanced every frame, the OSD-side texture gets invalidated every frame.

I would assume there are more callers that use the unscaled bitmap code path than just glyph drawing, however, so it would not be advised to simply remove the incrementing of the seqid without checking other use cases first.

That said, please try to keep in mind that trying to wring any sort of meaningful performance out of MAME's current rendering architecture will be an exercise in futility. It's 2006-era code, draws everything in immediate mode, and has no concept whatsoever of throwing video resources "over the fence", so to speak, between the core layer and the OSD (OS-dependent) layer.

There are long-term plans to completely rework the rendering paradigm in the emulator from the ground up, with the end goal of being able to run compute shaders in order to accelerate things like 3D chipsets in a pixel-accurate manner beyond the 3-4 CPU threads which we can currently use for software-rasterizing 3D chipsets. As you might imagine, this is a pretty large chunk of work, so there are no timelines, unfortunately.

oomek commented 4 years ago

Thank you for your insight. I applaud a complete rewrite of the ui code, but as you said it's a long term project. I have designed an antialiased bitmap font that I've incorporated into mame that looks really nice in 240p on CRT monitors. I made few more changes to make mame scale it in integer fashion, but I've been scratching my head how to go around that slowdown when the font glyphs are drawn in 1:1 scale. Maybe it would be worth it to do a temporary workaround for the case of 1:1 texture drawing? If you can think of any suggestions I could try please share.

image01

image02

image03

Oh and btw, have any idea what if (is_cmd) condition is used for? I suspect the icons like diamond are drawn that way, but I'm not sure. https://github.com/mamedev/mame/blob/bd7430d59a82bb75efa3ad4ed4dbcf8227e59e8a/src/emu/rendfont.cpp#L651

u-man74 commented 4 years ago

That "CRT" font looks beautiful. Everyone in the CRT scene will love this font. But i think, MooglyGuy did try to say you, that not the UI will be rewritten. It is more likely that the videomodes will be rewritten in a way, that the GPU will render everything. This might affect UI of course. Right now MAME renders the image and the postprocessing (shaders, ui, etc.) is done by GPU. This will be changed in a near future. It is maybe better to wait, until that change is done.

MooglyGuy commented 4 years ago

Changing the built-in font doesn't involve re-writing anything.

oomek commented 4 years ago

If you want the font to be in grayscale rather than in monochrome I believe you actually have to. Also inbuilt font does not scale when the game is drawn in 480i. It should double in size. Also for super resolutions like 2560x240 it should scale only width in integer fashion.

Anyway, I've managed to speed up 1:1 font rendering by modifying the following line: https://github.com/mamedev/mame/blob/1a53c842b9854f48d371e5ffda0325e61f005221/src/emu/render.cpp#L423

to:

if ((m_scaler == nullptr || (m_bitmap != nullptr && swidth == dwidth && sheight == dheight))
    && (flags ^ (PRIMFLAG_BLENDMODE(BLENDMODE_ALPHA) | PRIMFLAG_PACKABLE)))

I know it's a dirty hack that forces unscaled glyph textures through the scaler and cache generation, so I would rather like to fix the first condition.

u-man74 commented 4 years ago

AFAIK Mame does not support interlace output at all. Any output is always progressive (even for a interlaced game). I have the impression that you use GroovyMame. If that is the case, you need to try 480p and see if the same is the case. I support the idea of grayscale, as anything that has not high contrast is better for readability, especially in interlace modes, where monochrome would more tend to flicker.

MooglyGuy commented 4 years ago

If by grayscale you mean what amounts to purely an A8 bitmap, with 0 being fully transparent and 0 being fully opaque, then yeah, I'm right there with you, I think it would be a useful change to make.

oomek commented 4 years ago

I mean more than 1-bit color. Currently glyphs are rendered without antialiasing with height of 200px and scaled down to appropriate height. that way the rendering is antialiased and faster as the rendering is done using downscaled copies stored in a separate table. I wanted to speed up rendering when there is no scaling, any thoughts?

oomek commented 4 years ago

AFAIK Mame does not support interlace output at all. Any output is always progressive (even for a interlaced game). I have the impression that you use GroovyMame. If that is the case, you need to try 480p and see if the same is the case. I support the idea of grayscale, as anything that has not high contrast is better for readability, especially in interlace modes, where monochrome would more tend to flicker.

Yes I use Groovy on my CRT TV. Regarding 480i, that's the limit of my TV. What I said applies also to bigger resolutions, but you wouln't use a crt optimized bitmap font in that case, would you?

u-man74 commented 4 years ago

You need to be aware, that GroovyMAME takes the progressive output and make it interlace. Since the origin/source is not a interlaced content, you have no benefit from using interlace modes, regarding the games. It might be better for UI stuff, but not for the games. Personally, i like everything that creates a better CRT experience, but what the devs are thinking is more relevant and important.

oomek commented 4 years ago

The way the 480i is rendered is not relevant to the slowdowns. Please could we focus on the issue from the topic? Do you have any ideas/suggestions I could try?

MooglyGuy commented 4 years ago

@oomek Please be aware that u-man is not a MAME developer and doesn't speak for us.

At any rate, regarding the issue from the topic, have you tried simply commenting out the advancing of seqid? Are there any obvious side effects? I would try games with or without artwork, with and without SVG backgrounds, Laserdisc, and vector games. If none of them exhibit unusual behavior, it could be good to submit a pull request and let our semi-automated regression system, @Tafoid, see if anything more comprehensive breaks.

oomek commented 4 years ago

By just setting texinfo.seqid = 0 I've managed to speed up font rendering, but the game frames were not updating. Since game screens have a very high id (id << 57) I did the following:

// increment seqid if we get a screen texture
if (m_id > (1ULL << 56))
    texinfo.seqid = ++m_curseq;
else
    texinfo.seqid = 0;

This seems to work so far, but as you have mentioned I need to test it with backgrounds, artwork, overlays, lua, etc.

happppp commented 4 years ago

How about the method done over here?: https://github.com/yoshisuga/MAME4iOS/commit/a58258cdcc1f002ca01f3b9dadc47e8ec0995573

oomek commented 4 years ago

I've added: m_curseq++ in set_bitmap() and changed: texinfo.seqid = m_curseq in both places in get_scaled() and indeed this also seems to work, thanks.

MooglyGuy commented 4 years ago

Please feel free to submit a pull request containing this change.