joncampbell123 / dosbox-x

DOSBox-X fork of the DOSBox project
GNU General Public License v2.0
2.81k stars 383 forks source link

Grand Prix 2 does not run optimally #3211

Open grapeli opened 2 years ago

grapeli commented 2 years ago

Code of Conduct & Contributing Guidelines

Have you checked that no other similar bug report(s) already exists?

What operating system(s) this bug have occurred on?

linux

What version(s) of DOSBox-X have this bug?

newest and older

Describe the bug

Grand Prix 2 does not run optimally under dosbox-x.

Expected behavior

It will work better.

Steps to reproduce the behaviour

The first problem. The real time differs significantly from that in the game. The video recording time is 01:38.80, lap time 01:24.832. Difference -14.32s. Huge.

https://user-images.githubusercontent.com/452325/148699591-008b6518-d2ae-45ee-a45c-6f9176a20e10.mp4

What it looks like in dosbox-staging. Video time is 01:21.08, laps 01:25.647, acceleration! about +4.57s (slight compared to dosbox-x).

https://user-images.githubusercontent.com/452325/148699599-8ad2efc7-1e90-4942-8f1f-bac1d87c9f2f.mp4

Second point. GP2 could run faster in dosbox-x (related to the previous issue). Already at the start, the cpu load in GP2 is 61%. In the moment of a lot of smoke, it jumps to 123%.

https://user-images.githubusercontent.com/452325/148699616-4d4e2a2e-f602-4721-8542-584eab98a461.mp4

What does it look like under linux perf.

5.26%  dosbox-x        libm-2.33.so             [.] __fmod_finite
5.03%  dosbox-x        dosbox-x                 [.] VGA_Draw_Xlat32_Linear_Line
2.80%  dosbox-x        dosbox-x                 [.] mem_writed_checked
2.60%  dosbox-x        dosbox-x                 [.] CPU_Core_Dyn_X86_Run
2.21%  dosbox-x        dosbox-x                 [.] mem_readd_checked
2.20%  dosbox-x        dosbox-x                 [.] CPU_Core_Normal_Run
2.12%  dosbox-x        dosbox-x                 [.] RENDER_StartLineHandler
1.70%  dosbox-x        dosbox-x                 [.] MakeCodePage
0.94%  dosbox-x        dosbox-x                 [.] Normal1x_32_32_R
0.93%  dosbox-x        libc-2.33.so             [.] __memcmp_sse4_1
0.90%  dosbox-x        dosbox-x                 [.] PIT_Block::read_counter
0.87%  dosbox-x        dosbox-x                 [.] counter_output
0.85%  dosbox-x        i965_dri.so              [.] util_format_b8g8r8a8_unorm_unpack_rgba_8unorm
0.76%  dosbox-x        i965_dri.so              [.] linear_to_xtiled_faster
0.74%  dosbox-x        dosbox-x                 [.] IO_ReadB
0.65%  dosbox-x        dosbox-x                 [.] dyn_helper_idivd
0.64%  dosbox-x        dosbox-x                 [.] dyn_helper_divd
0.04%  dosbox-x        libc-2.33.so             [.] __poll
0.47%  dosbox-x        dosbox-x                 [.] read_p61
0.00%  dosbox-x        libc-2.33.so             [.] __GI___ioctl
0.45%  dosbox-x        libm-2.33.so             [.] fmodf32x
0.42%  dosbox-x        [JIT] tid 29310          [.] 0x0000794abab24e5e <----GP2 code the greater the load with this code, the better
0.42%  dosbox-x        dosbox-x                 [.] PIC_RunQueue
0.35%  dosbox-x        [JIT] tid 29310          [.] 0x0000794aba61bc84 <----GP2 code
0.33%  dosbox-x        [JIT] tid 29310          [.] 0x0000794aba585630 <----GP2 code
0.33%  dosbox-x        dosbox-x                 [.] vga_read_p3da

Why is fmod from libm so CPU intensive? Well, thanks to these functions: PIT_Block::read_counter+0x2a1 timer.cpp:257 PIT_Block::track_time+0x13d timer.cpp:152

Something in this code is not working very well. The most CPU stress should be the dosbox-x code.

It looks completely different under dosbox-staging. There is no overload from libm which translates into much smoother and better performance. These values are from perf top.

3.06%  dosbox                      [.] mem_writed_checked
1.82%  dosbox                      [.] CPU_Core_Dyn_X86_Run
1.78%  dosbox                      [.] mem_readd_checked
1.39%  dosbox                      [.] read_byte_from_port
1.24%  dosbox                      [.] MakeCodePage
1.07%  dosbox                      [.] Normal1x_8_32_L
0.92%  dosbox                      [.] dyn_helper_idivd
0.81%  libc-2.33.so                [.] _int_free
0.77%  [JIT] tid 28348             [.] 0x000072712e6f96c8      <----GP2 code (better saturation)
0.71%  [JIT] tid 28348             [.] 0x000072712e6f9643
0.66%  dosbox                      [.] read_latch
0.63%  libc-2.33.so                [.] malloc
0.55%  dosbox                      [.] read_p61
0.52%  dosbox                      [.] counter_latch
0.48%  dosbox                      [.] RENDER_StartLineHandler
0.44%  dosbox                      [.] write_byte_to_port
0.38%  dosbox                      [.] dyn_io_readB
0.31%  libc-2.33.so                [.] cfree@GLIBC_2.2.5
0.31%  dosbox                      [.] dyn_helper_divd
0.27%  [JIT] tid 28348             [.] 0x000072712e6f969a
0.25%  [JIT] tid 28348             [.] 0x000072712e0ef98d
0.23%  [JIT] tid 28348             [.] 0x000072712e6f9600
0.20%  [JIT] tid 28348             [.] 0x000072712e6f96ce

https://user-images.githubusercontent.com/452325/148699708-667411f7-017c-4148-af6e-6c9e12fff599.mp4

Third point. GP2 does not work properly with aspect=false. I think he should.

Used configuration

No response

Emulator log

No response

Additional context

No response

grapeli commented 2 years ago

Time Compression (video, lap): dosemu2 (without kvm, freedos) - 1:13.82 1:25.898, -12.08s (proccesor occupancy: 0%, sometimes 5%, top 10%) qemu (kvm, freedos ) - 1:15.64 1:24.412, -8.772s (proccesor occupancy: max 23%).

grapeli commented 2 years ago

GP2 runs on dosbox-x as I remember from the old days on amd k6-2 with voodoo banshee and very similar to dosbox-staging only in VGA mode, i.e. running like this: gp2.exe video:vga. It is then dynamic and fast enough. The real-time difference to the in-game time is identical to dosbox-staging. There is a slight acceleration of 5.5-6%. Quite different to this place in SVGA mode, when it slows down by almost 16%. The hourly race extends to an hour and 10 minutes.

rderooy commented 2 years ago

Try to set synchronize time=true in the [dosbox] section of your config. Assuming the game uses the system clock, it should prevent it from drifting.

grapeli commented 2 years ago

Try to set synchronize time=true in the [dosbox] section of your config. Assuming the game uses the system clock, it should prevent it from drifting.

I've already checked various settings. It also doesn't change anything. There is still a slowdown of 16%. The difference is in the graphics emulation layer (it is different from dosbox-SVN and -staging).

After exiting GP2, the time spent in the game is displayed.

You played Grand Prix 2 for 54 minutes 27 seconds

This is exactly how it should be.

grapeli commented 2 years ago

I checked it on completely different equipment. In the cloud under google cloud shell (GCS).

The effect is very similar. In GP2 with SVGA, the game is slowing down - in this case by about 12%. The processor is:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 49
model name      : AMD EPYC 7B12
stepping        : 0
microcode       : 0x1000065
cpu MHz         : 2249.998
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save umip rdpid
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 4499.99
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

Despite its slower clock speed, 2.25GHz is almost to 2-2.5x faster than my old Intel Westmere clocked at 2.4GHz (2.6GHz with turbo).

Linux perf is different. libm with __fmod_finite does not occupy the first position.

1.08%  dosbox-x                [.] mem_writed_checked
1.07%  dosbox-x                [.] VGA_Draw_Xlat32_Linear_Line
1.04%  dosbox-x                [.] mem_readd_checked
0.98%  dosbox-x                [.] CPU_Core_Dyn_X86_Run
0.84%  libc-2.28.so            [.] __memcmp_avx2_movbe
0.62%  libm-2.28.so            [.] __fmod_finite
0.56%  dosbox-x                [.] MakeCodePage
0.53%  dosbox-x                [.] counter_output
0.50%  libc-2.28.so            [.] __memmove_avx_unaligned_erms
0.47%  perf-2055.map           [.] 0x00007fa4b4e8dcf0
0.42%  perf-2055.map           [.] 0x00007fa4b4e8dcd5
0.38%  dosbox-x                [.] RENDER_StartLineHandler
0.27%  perf-2055.map           [.] 0x00007fa4b4e8de5d
0.26%  perf-2055.map           [.] 0x00007fa4b4e8e078
0.24%  perf-2055.map           [.] 0x00007fa49af7b081
0.22%  perf-2055.map           [.] 0x00007fa4b4e8e232
0.22%  libm-2.28.so            [.] __fmod
0.20%  perf-2055.map           [.] 0x00007fa49af7b0ba
0.18%  dosbox-x                [.] read_p61
grapeli commented 2 years ago

Dosbox-x log data. vga

454074       PIT:PIT 0 Timer at 64.0050 Hz mode 3
501251       INT10:Set Video Mode 13
501251       VGA:Blinking 0
501251       VGA:Blinking 8
501259       VGA:VGA refresh rate is now, 70.086

svga

454725       PIT:PIT 0 Timer at 64.0050 Hz mode 3
454736       INT10:Set Video Mode 101
454736       VGA:Blinking 0
454736       VGA:Blinking 8
454736 ERROR MOUSE:Unhandled videomode 69 on reset
454760       VGA:VGA refresh rate is now, 70.007

Data collected with gallium mesa. About three minutes (a lot of data). GALLIUM_HUD_DUMP_DIR=../data GALLIUM_HUD="simple,fps+cpu+frametime" MESA_LOADER_DRIVER_OVERRIDE=crocus dosbox-x

x fps cpu frametime (ms)
dosbox-x svga 34.85 15.73 29.05
dosbox-x vga 47.97 16.92 20.95
dosbox-svn svga 41.43 16.96 24.24

With the crocus driver, dosbox works slower (definitely weaker performance in benchmarks) than with the i965 driver. Well, but only with him you can collect them. The i965 can collect fps data to standard output LIBGL_SHOW_FPS=1.

grapeli commented 2 years ago

A few more observations. I found a lot of interesting information on the GP2 website at this address. The most important of them are:

Seems like GP2 uses the frame rate you have chosen as a minimum and the estimated as a maximum.

  1. for a good approximation, always set the desired framerate about 2 frames per second lower than the framerate that GP2 "recommends". GP2 is typically too optimistic with this estimate, and you end up with slightly sluggish gameplay.

From my side. Don't run GP2 under dosbox-x with cycles=max (unless you have good hardware, it does not apply to me). GP2 estimates the power of the CPU and graphics card at startup, which are used to determine the framerate. They can be read from the gp2log.txt file by running like this - gp2.exe log:on. With cycles=max these indexes are very understated under dosbox-x. Run with a constant value of cycles=50000 or greater.

After starting GP2, from the dosbox-x menu, select CPU -> Edit cycles and change to max. If GP2 doesn't run fast enough, don't rely on auto-tuning for graphics. Personally, I prefer these settings.

gp2_000

In my case, the key was not to use the button - "Use estimate". The Frame Rate must always be 2 frames lower for me. This is satisfactory anyway, as it is a possibly assumed minimum value.

With these settings finally GP2 works satisfactorily under dosbox-x in SVGA mode. Average fps jumped to 47 from 35 (measured in the 5 minute race). The game is slightly accelerated by about 5% (much better than the 16% slowdown).

I still don't know why GP2 performs less well under dosbox-x than it does with dosbox-staging and -SVN. Although this is still not obvious to me because in my case, GP2 it uses about 45-75% CPU anyway with dosbox-x. Contrary to popular belief, dosbox-x is not much slower than the other two, at least in my case. On my hardware, this dosbox-x in basic benchmarks - pcpbench, quake, doom is slower - maybe 15% at the most.

The useful keys are: O - processor occupancy Alt-d - turn off all the trackside objects

rderooy commented 2 years ago

I did some testing, here. I had the game set for SVGA and max everything for Graphics detail with 21.3 frames/sec.

gp2_000

In the Silverstone race I used an external stop-watch to time my race, which went poorly (spun out early on). I ended up with a paltry 5m24.776 for the 3 laps. According to my stop-watch it took 5m12, so a difference of roughly 12sec over 5 minutes.

I pressed o at various times during the race and never had more than 33% processor occupancy.

This is on an AMD Ryzen 5600X with Radeon RX590.

edit forgot to mention, my display is running at 75Hz, so dosbox-x does not need to drop frames.

grapeli commented 2 years ago

Thanks for the tests. On this page that I mentioned earlier, they recommend lowering the frame rate by 2 in relation to the estimated value. Rendering objects in the mirrors rather disturbs the concentration of attention (but with such equipment).

For me, under dosemu2, the processor occupancy in GP2 does not exceed 8% (Monaco) at the maximum settings. So what is not very playable because the acceleration is close to 20-30%. My processor is Intel Westmere 2.4GHz (2010) with opengl 2.1 compatible Intel graphics. I don't even want to know how many times slower than yours.

The main problem stems from the initial GP2 benchmark. For me, with cycles=max and the settings visible in the previous screenshot, it is 19.2 (it's a bit too low, it must be greater than 25.6). For comparison, on this hardware under Windows 10 and dosbox-x it is about 8.

These are the gp2log.txt values:

SVN (cycles=max), estimated average frame rate >25.6 Speed (c.f. DX2-66): 914 Video speed: 2397

dosbox-staging (cycles=max), estimated average frame rate >25.6 Speed (c.f. DX2-66): 744 Video speed: 1804

dosbox-x ---- cycles=max 50000 180000 200000 Speed (c.f. DX2-66): 289 425 1529 1699 Video speed: 2204 estimated average frame rate 19.2 (cycles=max), >25.6 with cycles=50000

edit: dosbox-x (cycles=max) windows 10 Speed (c.f. DX2-66): 159 Video speed: 1039

gp2 dosbox-x win10

dosemu2 (linux) Speed (c.f. DX2-66): 12067 Video speed: 75841

edit2: I corrected some data. For example, dosbox-SVN was run with scaler=none, the window lost focus at the time of the benchmark. The result for him was underestimated twice. Dosbox-staging was inadequately optimized in relation to the other two. You can see a huge disproportion between the dosbox-x benchmark. Completely inadequate. Compared to -SVN, the result is 3x lower, up to -staging 2.5x.

grapeli commented 2 years ago

With all the graphic details set, the performance in the GP2 approaches the upper limit. Playable, but the power reserve is far too small. The host CPU load is in the range of 70-95%.

https://user-images.githubusercontent.com/452325/149621111-0d577062-81f5-49a3-b48b-29b2c24ed56a.mp4

grapeli commented 2 years ago

I close.

The two main points are mainly related to the performance of the hardware on which we run GP2 or the inappropriate adjustment of the GP2 configuration in relation to the performance of the hardware.

This game is quite a challenge for emulators.

These individual remarks make sense. Especially regarding the underestimation of the GP2 benchmark with cycles=max. Perhaps it's best to use fixed cycles that are close to the maximum dosbox-x capability on your hardware. Possibly initially equal to or greater than 90000 with a later change to a maximum. Estimated average frame value should be greater than > 25.6.

GP2 requires really strong hardware. This applies especially to moments in the game such as - the start (generally the first lap), a large amount of smoke, the load also depends on the track on which we compete, etc.

It is good to read the information on the GP2 page mentioned earlier.

grapeli commented 2 years ago

Cause of invalid GP2 benchmark at cycles=max or auto is loss of dosbox-x window focus. The indication is then unreliable. This has an adverse effect on the average frame rate estimation.

In my case, the dosbox-x benchmark is usually: Speed (c.f. DX2-66): 289, although once in 10 attempts the indication is correct and amounts to: Speed (c.f. DX2-66): 596. And that's actually correct.

@rderooy You could check it on your own. gp2.exe log:safe nointro nomusic The dosbox-x window's loss of focus is easy to reproduce. At startup, press F12-f (fullscreen) or another combination - F12-p. Exactly at this point.

gp2_000

Focus is lost. Stopping.

gp2_001

Running GP2 with fullscreen=true doesn't help, because that happens somewhere "inside" dosbox-x with cycles=max or auto only.

edit: You can regain focus relatively quickly by running dosbox-x this way. The pointer cursor must be outside the smaller window 320x200 (the window will briefly take this size) but within the entire target dosbox-x window size (640x480). Such workaround results in a more solid benchmark (almost 2x higher). xdotool mousemove 460 280 && dosbox-x -fastlaunch -set windowposition=0,0 -set showmenu=false -set char9=false -set scaler=none -set doublescan=false -set aspect=false -c "gp2.exe log:safe nointro nomusic"

grapeli commented 2 years ago

This is the same case as with the Dimension demo.

2715

Dosbox-x then slows down unexpectedly. This occurs when the brightness changes.

https://user-images.githubusercontent.com/452325/150016264-9bac262e-a954-4060-b92c-a32f5f2d33e9.mp4

Just as there, it was helpful to change the focus of the dosbox-x window, so does it here. https://github.com/joncampbell123/dosbox-x/issues/2715#issuecomment-890313583

This may be related to the code in the VGA_ChainedVGA_Slow_Handler::writed function?

rderooy commented 2 years ago

DOSBox-X and staging are fairly close to each other for CPU speed, but DOSBox-X records much higher video speeds.

Also on exiting the game back to DOS, I notice some brief graphical glitches in DOSBox-Staging and DOSBox ECE that I don't see in DOSBox-X.

DOSBox-X SDL1

Starting the game in fullscreen with cycles=max

Starting the game in fullscreen with DOSBox-X SDL1 results in a messed up display where the left 50% of the game screen is stretched to fit the screen (this may be SDL1 getting confused about my dual-screen setup). And the keyboard input is going to the terminal where I started the game from.

Speed (c.f. DX2-66): 552
Video speed: 4959

Starting the game in windowed mode with cycles=max

Speed (c.f. DX2-66): 555
Video speed: 3883

DOSBox-X SDL2

Starting the game in fullscreen with cycles=auto

Speed (c.f. DX2-66): 556:
Video speed: 4629

Starting the game in fullscreen with cycles=max

Speed (c.f. DX2-66): 550
Video speed: 4130

Starting the game in windowed mode with cycles=auto

Speed (c.f. DX2-66): 567
Video speed: 3917

Starting the game in windowed mode with cycles=max

Speed (c.f. DX2-66): 563
Video speed: 3221

DOSBox-Staging

Starting the game in windowed mode with cycles=auto

Speed (c.f. DX2-66): 579
Video speed: 1728

Starting the game in windowed mode with cycles=max

Speed (c.f. DX2-66): 529
Video speed: 1368

Starting the game in fullscreen mode with cycles=auto

Speed (c.f. DX2-66): 558^M
Video speed: 1569

Starting the game in fullscreen mode with cycles=max

Speed (c.f. DX2-66): 603
Video speed: 1563

DOSBox ECE r4367

Starting the game in windowed mode with cycles=auto

Speed (c.f. DX2-66): 1022
Video speed: 2923

Starting the game in windowed mode with cycles=max

Speed (c.f. DX2-66): 977
Video speed: 2388
grapeli commented 2 years ago

@rderooy Strange results from these benchmarks. It is important how it compares. Mine is performed with relatively equal settings with similar optimization. All three were built with PGO+LTO.

I suspect you were running dosbox-staging like this: dosbox-staging -set windowresolution=default -set scaler=none -set glshader=sharp

If you run like this, you will have a benchmark maybe 30% higher. dosbox-staging -set windowresolution=640x400 -set aspect=false -set scaler=none -set glshader=none

Maybe dosbox-ECE was run with settings closer to the latter example. Maybe dosbox-x was running glshader=none as well? Maybe one binary is better optimized and the other less.

There are a lot of these nuances that affect the result (significantly). I try to take these details into account. Otherwise the results may not be reliable.

edit: It makes no sense to test GP2 with cyles=max and auto. It's one and the same for this game. For games requiring power, cycles="max 105%" can be used.

grapeli commented 2 years ago

@rderooy This way of regaining power by dosbox-x by changing the window focus works automatically for me only under a very unusual WM notion. You can try to repeat it manually under any other. In one terminal, type. tail -f GP2LOG.TXT | grep '^Speed'

On the second, run dosbox-x with GP2. dosbox-x -fastlaunch -set "dosbox quit warning=false" -set char9=false -set aspect=false -set scaler=none -set doublescan=false -c "gp2.exe log:safe" -c exit 2>/dev/null The settings dosbox-x is run with are very important in this case, do not modify them. Applies to scaler, char9, doublescan, aspect.

After GP2 starts, exactly when this sequence appears to lighten the graphics, very quickly move the pointer just outside the dosbox-x window and immediately return to the window area (you can do it twice). I can assure you that in 4-5 times you will see the benchmark result 30-40% higher (may depend on hardware?).

grapeli commented 2 years ago

@rderooy Changing the dosbox-x window focus is best done when the window resizes from 320x200 to 640x480. It is best to move the pointer outside the window about three or four times.

To understand this strange phenomenon, it's best to run under dosbox-x demo dimension. This one is more visible. The sequence of brightening the graphics takes a very long time. In GP2 it is very short (almost imperceptible). https://files.scene.org/view/mirrors/hornet/demos/1998/d/dimen.zip

grapeli commented 2 years ago

This GP2 benchmark anomaly looks something like this. Under the old version dosbox-x 0.83.15 SDL1 without opengl, but under it it was easier for me to reproduce and present it.

https://user-images.githubusercontent.com/452325/150165370-1cdff23c-28ab-4430-a3e0-f59fd98ebc90.mp4

I am very familiar with dosbox-x's overall performance versus dosbox-SVN or -staging. It is not 2-3x slower than them. For me, up to a dozen percent. Of course it depends on what is being tested and how.

rderooy commented 2 years ago

I did not have time to do any testing with GP2, but I noticed that when I start dosbox-x with cycles=max that it uses roughly 55% of a CPU core, and also it seems to be constantly changing core on which it's running.

So I used CPU affinity to pin the process to a single core https://man7.org/linux/man-pages/man1/taskset.1.html

With that and cycles=max I'm getting over 90% cpu core utilization. Also when I check the actual CPU clock speed, it seems like it is now actually using boost frequencies.

edit this was tested on an Intel i7-10610U

grapeli commented 2 years ago

I am only interested in the initial benchmark performed in GP2. This is a kind of calibration to estimate average frame rate. It has an impact on the behavior of the game if it is incorrectly estimated. You can read its value in Main Menu -> Options -> Graphics Detail Level. Dosbox-x slows down while performing this calibration. For this reason, I compared this behavior in different dosbox versions. With the same settings. I did not choose them to play with the best look, full screen, etc. Quite the opposite. This kind of test was not done for me to play GP2.

Dosbox-x, apart from this underestimation of the benchmark, is doing fine in GP2 on my hardware. GP2 performs a "bit worse" than -SVN and -staging. How would I estimate it at around 15-30%. I checked the CPU load very carefully during the 5-minute runs.

As for the gameplay itself. The lower the CPU usage when playing GP2 under dosbox (with cycles=max), the better. If the CPU usage is approaching 95-100% and at the same time in GP2 you have Processor occupancy values exceeding 100%, it means that the game will slow down (very bad symptom, too little power for full emulation).

https://www.grandprix2.de/wissen/01wissen.htm

Processor Occupancy (PO) When you push the 'O' key during a replay or when you are playing GP2, the game will show you a percentage value that indicates how bussy your processor is. This value is called Processor Occupancy (PO) and ranges from 0 to 100% .... and HIGHER. Basically, a value less than 100% means that you are driving in "realtime": your processor is fast enough to calculate the desired framerate with the selected graphical options(objects around the track, textures, smoke..). So if you drive a lap in 1m30 (simulated time), you will have been driving (close to) 1m30 in reality also. When you would have a constant PO-value of 200% during driving, the simulated time would still be 1m30, but if you watch your clock on your wrist, you'll notice that you drove around 3min in reality! You drove in slowmotion! (clearly recognisable in the slowdown of the action on screen) No, this has nothing to do with Einstein's Theory of Relativity: your processor simply tries to calculate all the frames, but since it is not powerful enough it needs twice as much time as desired. Note that it does not really work like that the other way round: if your processor is so powerful that it has only 50% occupancy, you will not play faster than realtime. Also note that the "watch your clock"-experiment above is a theoretical example.

Slowmotion driving --Slowmotion driving is the result of the internals of the GP2 game engine. Instead of dropping frames when the CPU isn't capable of giving you the configured framerate and graphics details , the engine stretches gametime. This helps in getting better laptimes, and is of course against the spirit of hotlap leagues. --This is the most criticised "feature" of GP2: the game-engine doesn't drop frames when the processor can't handle it. Instead, it stretches gametime! This leaves the possibility to drive in slowmotion.

When you set the framerate and/or graphics detail to a level that your computer cannot deliver in realtime, the game slows down and gives you more time to react. For example it becomes much easier to read your speed in corners. It's clear that this is against the spirit of hotlapping! Ultimately, hotlapping is all about testing your reactions...in realtime. Driving in slow-motion gives you an advantage over other people. Tests in practice have shown that people can be upto 0.8 seconds faster on a track like Adelaide, only by forcing a PO of 160-170 instead of under 100!! It is very difficult, even impossible to spot this on the basis of a replay. When you watch a replay of someone elses lap on your computer, the occupancy values reflect the occupancy of your own processor. So it does not give information about the PO when he was driving on his system. The only indications are things like speed of gearchange or other actions that seem to be executed supernaturally fast when replayed in realtime.

grapeli commented 2 years ago

CPU usage comparison. Taken during a quickrace at Monza, Italy. Three laps. Similar times. From the first corner to the finish in first position. After I had passed the finish line, I pressed pause. Then I was freezing the cgroup container.

start systemd-run -G --user dosbox-x -conf gp2.conf freeze systemctl --user freeze run-r6dfa9d61b77946beada94cf3825ffb3d.service reading the cpu consumption systemctl --user status run-r6dfa9d61b77946beada94cf3825ffb3d.service

svn==== CPU: 2min 24.584s 2min 17.198s 2min 17.602s 2min 16.779s staging= CPU: 2min 34.187s x====== CPU: 3min 01.906s 2min 59.494s 2min 55.736s 2min 48.295s 2min 50.742s 2min 57.362s 2min 53.710s

It's really hard to do exactly. This is only a rough comparison.

maron2000 commented 2 years ago

I didn't check so deeply but cpu usage of my host CPU goes up only around 30%. If fmod function is consuming power, you might want to test replacing with fma() function, as I did in this PR. https://github.com/joncampbell123/dosbox-x/pull/3370 gcc optimizes fma function if your cpu is equipped with fma instructions.

grapeli commented 2 years ago

@maron2000 Unfortunately, my processor has no fma. Not for me.

You can monitor the CPU load on the GP2 itself. Hold down the 'O' key during the game. The load is variable. Dependent on graphic settings, number of displayed objects, amount of smoke, track, etc. In the game itself, it should not exceed 100%, except occasionally.

Choose Quickrace in Monaco.

maron2000 commented 2 years ago

I put a test code in my repsitory, replacing some fmod() to fma(). It seems that Processor Occupancy that you see when you press O is significantly lower. However, the lap time is yet some what lagging. On my laptop, I set CPU timing to 300,000(Athlon) and frame rate option to 23.5fps, the laptime difference between the game and reality is 1sec. May be worth trying in your PC. You can find the code here. 0df35d5b8b243ac48bb239bcbf1b229851e7c4ce

Edit: The graphics setting are SVGA with maximum quality setting.

grapeli commented 2 years ago

Unfortunately, your fix is not working well on my hardware. The difference in CPU time is 13.5 seconds. So much more was consuming dosbox-x with your fix on this run.

Quickrace Monza (3 Laps) Leading from the entrance to the first corner to the finish line. ___race time | CPU time dosbox-x-0.83.24_4m 17.660s | CPU: 2min 56.464s dosbox-x-0.83.24+your fix____4m 18.959s | CPU: 3min 9.980s

It shows in perf. dosbox-x-0.83.24

3.95%  dosbox-x                    [.] mem_writed_checked
3.95%  dosbox-x                    [.] VGA_Draw_Xlat32_Linear_Line
3.02%  dosbox-x                    [.] CPU_Core_Dyn_X86_Run
2.08%  dosbox-x                    [.] MakeCodePage
1.94%  libm.so.6                   [.] __fmod_finite    <---------------------------
1.93%  dosbox-x                    [.] mem_readd_checked
1.51%  dosbox-x                    [.] RENDER_StartLineHandler
0.77%  libc.so.6                   [.] __memcmp_sse4_1
0.67%  dosbox-x                    [.] counter_output
0.66%  dosbox-x                    [.] Normal1x_32_32_R
0.64%  i965_dri.so                 [.] util_format_b8g8r8a8_unorm_unpack_rgba_8unorm
0.60%  dosbox-x                    [.] dyn_helper_idivd
0.56%  i965_dri.so                 [.] linear_to_xtiled_faster.lto_priv.0
0.54%  [JIT] tid 29717             [.] 0x0000776ffa99e3d1
0.53%  [JIT] tid 29717             [.] 0x0000776ffa99e349
0.41%  [JIT] tid 29717             [.] 0x0000776ffa6c3e80
0.40%  [JIT] tid 29717             [.] 0x0000776ffa6c3cc0
0.39%  dosbox-x                    [.] mem_writeb_checked
0.38%  [JIT] tid 29717             [.] 0x0000776ffa6c3b60
0.37%  [JIT] tid 29717             [.] 0x0000776ffa6c3cdf
0.35%  [JIT] tid 29717             [.] 0x0000776ffa6c3b7f
0.33%  [JIT] tid 29717             [.] 0x0000776ffa6c3e9f
0.30%  dosbox-x                    [.] IO_ReadB
0.29%  [JIT] tid 29717             [.] 0x0000776ffa803da9
0.28%  [JIT] tid 29717             [.] 0x0000776ffa803ddf
0.24%  dosbox-x                    [.] dyn_helper_divd

dosbox-x-0.83.24+your fix

3.76%  dosbox-x                  [.] VGA_Draw_Xlat32_Linear_Line
3.46%  dosbox-x                  [.] mem_writed_checked
2.68%  dosbox-x                  [.] CPU_Core_Dyn_X86_Run
2.17%  libm.so.6                 [.] __fmod_finite  <--------------------------------
2.14%  libm.so.6                 [.] feclearexcept  <--------------------------------
1.85%  dosbox-x                  [.] mem_readd_checked
1.73%  libm.so.6                 [.] sincos         <--------------------------------
1.52%  dosbox-x                  [.] MakeCodePage
1.32%  dosbox-x                  [.] RENDER_StartLineHandler
0.90%  libc.so.6                 [.] __memcmp_sse4_1
0.84%  dosbox-x                  [.] counter_output
0.68%  i965_dri.so               [.] util_format_b8g8r8a8_unorm_unpack_rgba_8unorm
0.66%  dosbox-x                  [.] Normal1x_32_32_R
0.63%  dosbox-x                  [.] dyn_helper_idivd
0.54%  i965_dri.so               [.] linear_to_xtiled_faster.lto_priv.0
0.52%  [JIT] tid 29654           [.] 0x000073ac056b1c79
0.50%  [JIT] tid 29654           [.] 0x000073ac056b1d01
0.34%  dosbox-x                  [.] my_fmod_fma    <----------------------------------
0.31%  dosbox-x                  [.] mem_writeb_checked
0.30%  dosbox-x                  [.] IO_ReadB
0.29%  dosbox-x                  [.] dyn_helper_divd
maron2000 commented 2 years ago

@grapeli Thanks for testing. Though I also observe time difference between the game and reality, it's like several seconds per lap for me. I'll see if I can find something more...

grapeli commented 2 years ago

@maron2000 For comparison, dosbox-SVN. One crucial note, this binary was PGO + LTO optimized. The preceding results from dosbox-x are normal builds. This translates into approximately at least 10 seconds more CPU time consumption (in the case of dosbox-x).

__| race time | CPU time dosbox-SVN | 4m 17.520s | CPU: 2min 22.816s

Perf. This is not the complete output from linux perf (top lines). You can see that under dosbox-SVN a huge difference. The JIT code of the GP2 game itself reigns supreme.

5.02%  dosbox                      [.] mem_writed_checked
3.33%  dosbox                      [.] CPU_Core_Dyn_X86_Run
2.22%  dosbox                      [.] mem_readd_checked
0.71%  dosbox                      [.] Normal1x_8_32_L
0.68%  [JIT] tid 31574             [.] 0x000000000662a50f
0.64%  [JIT] tid 31574             [.] 0x0000000006351a40
0.63%  [JIT] tid 31574             [.] 0x0000000006351ba0
0.62%  [JIT] tid 31574             [.] 0x0000000006351d50
0.62%  dosbox                      [.] dyn_helper_idivd
0.60%  [JIT] tid 31574             [.] 0x000000000662a48e
0.57%  [JIT] tid 31574             [.] 0x0000000006351a5c
0.56%  [JIT] tid 31574             [.] 0x0000000006351d6c
0.54%  dosbox                      [.] RENDER_StartLineHandler
0.54%  [JIT] tid 31574             [.] 0x0000000006351bbc
0.49%  [JIT] tid 31574             [.] 0x00000000064a200c
0.48%  dosbox                      [.] mem_writeb_checked
0.44%  [JIT] tid 31574             [.] 0x00000000064a1fc2
0.37%  [JIT] tid 31574             [.] 0x00000000064a1f7f
0.34%  [JIT] tid 31574             [.] 0x00000000064a1fdb
0.33%  [JIT] tid 31574             [.] 0x00000000064a2112
0.30%  dosbox                      [.] dyn_io_readB
0.28%  [JIT] tid 31574             [.] 0x0000000006351d6d
0.28%  [JIT] tid 31574             [.] 0x0000000006351bbd
0.27%  [JIT] tid 31574             [.] 0x0000000005f790cd
0.26%  [JIT] tid 31574             [.] 0x0000000005f790c3
0.24%  [JIT] tid 31574             [.] 0x0000000005fa7058
0.20%  [JIT] tid 31574             [.] 0x000000000605028a
0.20%  [JIT] tid 31574             [.] 0x000000000662a4e3
0.19%  dosbox                      [.] counter_latch
0.19%  dosbox                      [.] dyn_helper_divd
grapeli commented 2 years ago

@maron2000 The difference between real time and in-game time is fine under dosbox-x. It may look different for each track. In Monza it is about +4-5% (acceleration). On Interlagos (Brasil) it is smaller.

On a real PC on Interlagos it is around +2%. https://www.youtube.com/watch?v=znu4CizLqqE

You can find other extreme cases where the deceleration or acceleration is greater than 20%. https://www.youtube.com/watch?v=uYyScvV-lGE #GP2 slowdown -83.3% 174.35 (real) vs 95.109 (game) https://www.youtube.com/watch?v=jXOC8v1B0lw #GP2 too fast +23.2%

grapeli commented 2 years ago

This huge slowdown (second video from yt) of 83% is due to the fact that this person did not adjust the graphics settings to the hardware (emulation) capabilities. Processor Occupancy in GP2 is on average around 183%.

Significant acceleration is due to not very perfect emulation (???). I note such a ~20% acceleration in dosemu2.

Edit: GP2 has a lot of requirements at the maximum graphics settings. I played it a long time ago on amd k6-2 300MHz with 3DFx Banshee, as far as I remember, even this configuration was not able to ensure no slowdowns on the Monaco track (with max graphic settings).

Grand Prix 2 processor comparison https://www.youtube.com/watch?v=MAc4Xi_Qfos

brunocastello commented 2 years ago

GP2 is not a 3Dfx game. It's not more demanding than GP3 or GP4. Check your cycles speed. Something like: core=auto cputype=auto cycles=auto 7800 70% limit 26800

Should be fine. I used to play GP2 just fine with DOSBox-X years ago before moving to QEMU. Game even runs fine on my iPhone 12 Mini and DOSPad, a port of dosbox for iOS.

grapeli commented 2 years ago

@brunocastello I mentioned 3DFx Banshee in terms of decent 2D quality and performance under DOS.

GP2 is one of the most demanding dos games. I have posted a link to the video. On a Pentium II 366 MHz configuration, the GP2 slows down when starting at the Monaco track. Hardware configuration from Win98SE and not DOS. https://youtu.be/MAc4Xi_Qfos?t=278

I ran it in an x86 emulator in a browser. https://copy.sh/v86/ It worked well in VGA, and in SVGA not very well (no sound).

brunocastello commented 2 years ago

I could make a video of GP2 running on my iPhone if you want to see the performance. I tested Monaco right now.

PO is 70% for most of the track with some occasional spikes up to 100% and 130% in specific places. But gaming wise the performance is very well playable. I probably tweaked the graphics a bit for a better performance and I am using a custom 1994 carset with updated cars, helmets liveries. But yea I am using VGA instead of SVGA, because I don't think there is a huge difference in terms of graphics between them. The only difference is performance (SVGA is helluva slower, yeah).

I'll update my comment soon with two screenshots of my graphics config (for both SVGA and VGA).

brunocastello commented 2 years ago

IMG_1351

IMG_1352

These are the settings I'm using.

M-Hamano16 commented 10 months ago

I'm very sorry if this is not the correct place to ask this, or if I am doing something wrong- but I found this page looking for information on how to run gp2 with dosbox-x, as I was motivated to try something new. For years I have been using dosbox ece and it has worked good, but I was turned on by the idea for finding a new dosbox setup that has savestates. I experienced the same thing today with the time of a lap being much longer than the clock time shown in the cockpit of the game. For me, my test track was a monaco version that is a little bit more populated with objects than the original "vanilla" monaco that ships with the game. In my dosbox ECE the time is very close in real life to what I see on the cockpit. In my dosbox-x setup, I was getting lap times that took (in real life) about 1m40s, while the cockpit clock shows about 1m24s.