keirf / flashfloppy-osd

On Screen Display and keyboard controller for FlashFloppy
The Unlicense
57 stars 15 forks source link

OSD output keeps disappearing in VGA mode on SPI1 #45

Open tkurbad opened 2 years ago

tkurbad commented 2 years ago

I thought I'd break this out into a new issue as I can't seem to get hold of the cause.

Problem: If a FF-OSD is connected to an Amiga on a flicker fixed (i.e. 31kHz) output via SPI1 of the blue pill, after the first mode switch, the OSD output will disappear. There will either be no OSD at all or only empty, but properly resized, background boxes without text.

To reproduce: (1) Connect a blue pill with latest FF-OSD firmware via its SPI1 output to a flicker fixed Amiga output. Configure the FF-OSD to either work in VGA or automatic sync mode. Connect the Amiga keyboard lines as well. (2) Start up the Amiga w/ both mouse buttons pressed to enter the early startup menu. (3) After entering early startup, press LCTRL-LALT-Del on the Amiga keyboard to switch the OSD off. Instead of 'OSD Off', you will get no text output on screen, but just an empty OSD background box. (4) Press LCTRL-LALT-Del again to turn the OSD back on. All subsequent OSD output will just be empty boxes (or nothing at all)

Note that all of this ONLY occurs while on SPI1 AND in VGA mode. I could neither reproduce it in 15 kHz mode nor on SPI2.

I tried to debug this, but am out of my wits now. All the text to be shown is properly stored in the appropriate arrays, the render_line function seemingly does what it should and the empty OSD boxes are resized for the content they are supposed to show. The occurance of the problem can be deferred (but not completely eliminated) by using the -O1 optimization level instead of -Os at compile time.

IMHO, this hints at either some barrier/timing problem or at some interrupt being missed.

I'd be very glad if someone could try and reproduce this as per the above steps so I can rule out a problem with my hardware setup.

Thanks in advance!

keirf commented 2 years ago

I just had a user report that OSD hangs/disappears on their system after a while when running v1.9. Previously they ran v1.8. So it could be worth a test of your changes on top of v1.8?

keirf commented 2 years ago

Another thing to try is a hard reset of the SPI peripheral during reconfigure. This can be enacted via RCC APBxRSTR registers.

tkurbad commented 2 years ago

Interesting. I'll try bare 1.8 later and see if it works for me.

tkurbad commented 2 years ago

Tried a few more things. Resetting the SPI peripheral before applying a new mode in the setup_spiX function has no apparent effect. This was no surprise, because there isn't really a mode switch happening between turning on the Amiga and entering the early startup. Resolution and frequency stay the same AFAIK.

Looking at the (small) diff between v1.8 and v1.9, I think the culprit is in the revised handling of the timers wrt the AT32F403.

I guess I'll do some kind of a v1.8.1 first, with all the hotkey functionality of v1.9 but w/o the F403 related changes and see how this goes. If that works, I'll try and implement the proposed changes of issue #44 on top of that.

Next week I'll hopefully have much more time for all of this.

keirf commented 2 years ago

I take it the 403 changes in v1.9 are still only suspected rather than definitively blamed?

tkurbad commented 2 years ago

Yes, it's still only a suspicion so far. Debugging is awfully slow with two 3 year olds that need constant attention during daytime... ;)

And, because of that, the most obvious test, namely checking if bare v1.8 does have the issue as well is still on my to-do list.

tkurbad commented 2 years ago

Ok, tried v1.8 now, and it behaves even worse: Whenever the Amiga output is switched off (or away from) and back on again, the next hotkey action completely stalls the OSD.

As with v1.9, this only happens while the flicker fixer is switched on, in 15 kHz mode everything is (and stays) fine. I'll have to dig deeper with more time next week. I'd like to look at the sync signals with an oscilloscope to better understand what's going on.

PS: Happy New Year! :)

tkurbad commented 2 years ago

As expected, my oscilloscope doesn't show anything suspicious. Hsync and vsync run uninterrupted from the moment I switch on the Amiga up until the Workbench appears (at least without screenmode.prefs) Nonetheless, the moment I press LCTRL-LALT-Del, the OSD boxes become (and stay) empty until the blue pill is reset. I played with barriers, optimization levels etc., but nothing really helps.

Funny enough, all non-flagged hotkeys don't exert that behavior. The example of switching between 4 ROMs using U(0) and U(1) works flawlessly. The instant I use one of the 'non-standard' hotkeys, the bad thing happens.

Everything seems to hint towards the output handling via snprintf corrupting the display buffer. However, this isn't supported by the fact that I can printk all the display struct values to the serial debug terminal, and it seems to be fine.

Then there's the very strict timing demands of the 36 MHz SPI1 output. So, another thing that might go wrong is the SPI DMA getting screwed up by things like out of order access or missed/long running interrupts. However, I'm not sure how to test for that...

Edit: Replaced snprintf for the "OSD On/Off" notify strings by a static strcpy - and sure enough it works.

I'll reverify tonight or tomorrow and if this really IS the issue, I'll prepare a PR.

tkurbad commented 2 years ago

@keirf The issue keeps reappearing and I still can't make sense of it. If I change the code it sometimes seems as if the problem might have gone away, because it's not there immediately, but after two or three power cycles of the Amiga it manifests upon first hit of LALT-LCTRL-Del. Perhaps you can think of something I'm not seeing.

Facts I might have established so far (I'm pretty sure of those):

Furthermore (I'm certain about these):

What I'm still not sure about:

Any ideas where else to look? (Sorry for all the noise about this corner case issue, btw. ;-) )

Edit: Here's a picture of what the corrupted OSD looks like. In this state, the OSD box still resizes according to display.cols and display.rows, but never shows any text. issue

keirf commented 2 years ago

It seems likely that SPI, or the DMA which serves it, is somehow stuck. It is SPI output activity which drives the RGB output pin. Perhaps do things like log the DMA CNDTR register (this counts down as DMA transfers occur), SPI control/status registers, at suitably interesting times. Look for differences between when the box works, versus when it doesn't.

Cortex M3 doesn't do much reordering, in most cases barriers aren't needed. There's a document on this somewhere... https://documentation-service.arm.com/static/5efefb97dbdee951c1cd5aaf?token=

DMA buffer corruption: Not sure that makes much sense, but perhaps DMA state machine corruption or bad state.

IRQs missed: Print dots in the IRQ handlers, or look for evidence of DMA setup in IRQ handlers via the sort of logging suggested at the start of this comment.