robhagemans / pcbasic

PC-BASIC - A free, cross-platform emulator for the GW-BASIC family of interpreters
http://www.pc-basic.org
Other
382 stars 47 forks source link

PC-BASIC 2 graphics do not update between PSET calls #166

Closed JDoucette closed 1 year ago

JDoucette commented 2 years ago

Bug report

See: https://www.youtube.com/watch?v=9J9DtrcoDXc

Problem

Steps

  1. run below program
  2. expect graphics updates to happen in real-time
  3. they do not
  4. add A$=INKEY$ inside the loop, and it works

Program

10 SCREEN 1
20 FOR Y=0 TO 199:FOR X=0 TO 319
30 PSET(X,Y),INT(RND*4)
40 NEXT X:NEXT Y

Notes

PC-BASIC version: 2.0.4 Operating system version: Windows 10

JDoucette commented 2 years ago

This also occurs in text mode.

5 SCREEN 0,0,0:WIDTH 80:CLS
10 I!=I!+.1:J#=J#+.1#:LOCATE 1,1:PRINT I!;J#;(I!-J#):GOTO 10

Simple program that adds 0.1 repeatedly and draws the results. The screen should be updating blazingly fast. But it stalls, and updates randomly.

p.s. Unrelated to the bug: This reminiscent of a very old program I made as a kid where I was amazed that adding 0.1 over and over again would not result in accurate results, and furthermore, you could watch the velocity of the accumulative error, and the acceleration changes as it would jump into positive flow, then negative flow.

robhagemans commented 2 years ago

Thanks for reporting, fixed with issue 8d1ae6f0. The fix is to call time.sleep(0) twice. I have no idea why this works or more precisely, why it is necessary.

theruler commented 1 year ago

The issue is still present when using more than one core due to, I think, python's GIL. If you set the affinity to one or two cpu the bug is gone and the screen updates fast without the A$=INKEY$ workaround. But if you leave default (ALL) CPU affinity, the update is "choppy" and have to use A$.

i76700 - 1 CPU, 4 CORES, 8 THREADS Win 11 - pcbasic 2.0.6

robhagemans commented 1 year ago

I'm sorry @theruler can you please explain what you're doing? What is "affinity"? I do not see this problem anymore on my machine (which is also multi-core) so would need more information to be able to reproduce.

theruler commented 1 year ago

Yes, sure. Tried on three modern multicore PC with windows (2x win10 and 1x win11): If I paste the code on PCBASIC 2.0.6 without the A$=INKEY$ workaround:

5 SCREEN 0,0,0:WIDTH 80:CLS
10 I!=I!+.1:J#=J#+.1#:LOCATE 1,1:PRINT I!;J#;(I!-J#):GOTO 10

I get the same results of jdoucette, the output is slow and stalls updating randomly.

To solve the issue OS side, you have to open Task Manager, "details" tab, find pcbasicw.exe process, click the right mouse button on it and select "process affinity". There you can allocate HW resurces (logic CPUs) for the process. If you set 1 CPU only, the problem is gone. On some PC you can set 2 logic CPU and still solve the issue and have a minor speed increase. 3 or more CPUs makes pcbasic "lag", I think due to Python's Global Interpreter Lock (GIL) feature.

Hope this was of help to you.

Regards Stefano

theruler commented 1 year ago

I did some timing tests with and without A$=INKEY$ and using single or multi cpu affinity. i5-1135G7@2.4GHz Windows 10 PC-BASIC 2.0.6

10 CLS:T=TIMER
20 FOR X= 1 TO 10000
30 I!=I!+.1:J#=J#+.1#:LOCATE 1,1:PRINT I!;J#;(I!-J#)
40 NEXT
50 ? TIMER-T

multi: 18 sec (choppy output) single: 35 sec (smooth output)

10 CLS:T=TIMER
20 FOR X= 1 TO 10000
30 I!=I!+.1:J#=J#+.1#:LOCATE 1,1:PRINT I!;J#;(I!-J#)
35 A$=INKEY$
40 NEXT
50 ? TIMER-T

multi: 87 sec (smooth output) single: 122 sec (smooth output)

robhagemans commented 1 year ago

Thanks @theruler for the metrics.

So this is actually a slightly different issue as the original issue occurred on single-core as well, and led to a complete block of screen updates. That has been resolved, however what we're seeing here is choppy updates on multi-core machines, at least under Windows.

You are probably right that this is related to the GIL. It looks similar to this: https://www.rfk.id.au/blog/entry/a-gil-adventure-threading2/ . The analysis in that case is that when the threads execute on different cores, they race to acquire the GIL and this leads to slower performance on multiple cores than on single core. In-depth explanation in David Beazley's slides here http://www.dabeaz.com/python/UnderstandingGIL.pdf

However:

The root cause of the issue I think is that (given the GIL in CPython) the display thread simply has too much CPU work to do to keep up with the engine thread, which unfortunately takes rather big changes to address and is something I'd defer until we have better test coverage.

It may be possible to work around the issue by explicitly setting the affinity from the code - however for some reason os.sched_setaffinity is only available on Unix so this likely requires linking this thing in kernel32.dll: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-setprocessaffinitymask

robhagemans commented 1 year ago

Further update

For reference, on Linux, these are "smooth":

python2 ./run-pcbasic.py update.bas 
taskset -c 0 python3 ./run-pcbasic.py update.bas

while this is "choppy":

python3 ./run-pcbasic.py update.bas

Here, update.bas is the two-line program above at https://github.com/robhagemans/pcbasic/issues/166#issuecomment-1225707565

theruler commented 1 year ago

I'm assuming you're running this on Python 3 - if it's the packaged Windows version it is Python 3.

correct.

I am not concerned about speed per se, what I think pc-basic should achieve is the same speed as an old 386, on all system independently. adding A$=INKEY$, surprisingly, make high computational program behave similarly on different machines.

theruler commented 1 year ago

Further tests OS side (windows 10). It's interesting how different behave python 3 and python 2. I've used the same program here Single CPU affinity, with A$=INKEY$ (first half of the chart) and without (second half). the darkest area is Windows KERNEL TIME.

windist - PC-basic 2.0.6 - Python 3: kernel time p3 ------- 122 sec ---- 35 sec

pure python PC-basic 2.0.6 - Python 3: (python3 ./run-pcbasic.py update.bas) purepython3 ------- 167 sec ------ 12 sec Here is an interesting behaviour: the first part shows all kernel time and the execution is very slow, the worst performance actually. In the second part, the one without the A$=INKEY$ workaround, as you noticed the first 7 seconds kernel time is not present, then it kicks in for the rest of the run. The program in the first part is VERY slow, then it speeds up to full speed, exactly when kernel time kicks in.

PC-basic 1.2.15 - Python 2: kernel time p2 --------- 55 sec ------- 12 sec

quote from microsoft technet:

The kernel is that part of the operating system that acts on behalf of all the applications and services running on the system. The kernel is the only part that of the operating system that has access to the physical resources of the computer. Access to the kernel is handled through very specific, highly privileged, API calls.

The most common culprit of run-away kernel time is a poorly written device driver. Another could be a failing hardware device that is causing a flood of hardware interrupts - hardware interrupts are handled only by the kernel.

In a system that is not really running any applications, the kernel continues to perform its functions, which includes all sorts of 'house-keeping' tasks to ensure the system is running correctly. Therefore, if no user processes are running, kernel time will be a higher percentage of overall usage. But as the number of user processes kick in, you will see the percentage of kernel versus user time go down.

Lastly, if you have to ask what kernel time is, you are most likely not the person to 'debug' it. There are very few people who work in kernel development and troubleshooting.

theruler commented 1 year ago

You've fixed it! with your changes single and multi cpu speed are almost identical and the CPU load is back to normal. multi 24.39 sec single single 23.75 sec

The output by the way features pseudo-random accelerations. You can tell looking at the counter running.

I think it is worth paste your code comment in eventcycle.py:

    # what I think is happening here is that the sdl2 interface thread,
    # in its loop to process a single queue item, calls C library functions
    # which do not need the GIL. so it releases it. this allows the engine thread to pick up
    # and produce more work for the interface thread. sleep(0) does not wait but it does release
    # the GIL back, so we need a few sleep(0) calls to allow the interface to get through
    # individual items.
    # this would not be a problem if the interface thread did not need the GIL at all
    # (perhaps with numba, nuitka, cython, pypy or jython)
    # but note that the video queue is a Python object so may require the GIL
    # or if it held the GIL for a full cycle
    # wait to for the queue to drain if it excceds a threshold value
    # this allows the interface to catch up with video updates

I've tried to run the code deleting the lib folder where SDL2.DLL and SDL2_gfx.DLL are, and it run anyways. Does this mean that SDL is not loaded or pcbasic is just using SDL in other PATH folder?

here are the warnings:

c:\Intel\pcbasic-master>python run-pcbasic.py [09:49:05.0646] WARNING: Failed to load library sdl2: could not find any library for SDL2 (looked in paths: ('C:\Intel\pcbasic-master\pcbasic\lib',)) [09:49:05.0646] WARNING: Failed to load library sdl2_gfx: could not find any library for SDL2_gfx (looked in paths: ('C:\Intel\pcbasic-master\pcbasic\lib',)) [09:49:05.0646] INFO: Could not initialise video plugin sdl2: Module sdl2 not found [09:49:05.0646] WARNING: The pygame interface is deprecated, please use the graphical interface instead. [09:49:05.0646] INFO: Could not initialise video plugin pygame: Module pygame not found [09:49:05.0662] WARNING: Failed to load library sdl2: could not find any library for SDL2 (looked in paths: ('C:\Intel\pcbasic-master\pcbasic\lib',)) [09:49:05.0662] WARNING: Failed to load library sdl2_gfx: could not find any library for SDL2_gfx (looked in paths: ('C:\Intel\pcbasic-master\pcbasic\lib',)) [09:49:05.0662] INFO: Could not initialise audio plugin ansi: Module sdl2 not found

robhagemans commented 1 year ago

By the looks of that log, PC-BASIC would have ran in text mode without using SDL2. You'd notice though, it looks rather different, running in the command prompt window rather than opening a window of its own.

theruler commented 1 year ago

@robhagemans do you have any eta of a new version that would include this fix?

robhagemans commented 1 year ago

Hi, I'm afraid I can't give an ETA, this is purely a project I work on in spare time.

I've been meaning to spend a bit of time on it for a while but have been busy with other projects and of course the day job

On Mon, 13 Mar 2023, 14:57 Stefano Crespi, @.***> wrote:

@robhagemans https://github.com/robhagemans do you have any eta of a new version that would include this fix?

— Reply to this email directly, view it on GitHub https://github.com/robhagemans/pcbasic/issues/166#issuecomment-1466308587, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB25RESINSAATY2OUBHOOOTW34YVDANCNFSM5H7IUQWA . You are receiving this because you were mentioned.Message ID: @.***>