Closed andrew-davie closed 3 years ago
For calculating the ARM cycles correctly, we would need some help. It is not very straightforward.
Recently I have added some coarse cycle estimation to the debugger. Maybe that helps.
I think what is doable (if that's not what the debugger is already doing) is counting the number of instructions executed and multiplying it with a fudge factor to account for the waitstates from accessing memory. The accuracy will depend on the type of CPU load executed, but I think it should be good enough for what @andrew-davie describes.
Proper cycle counting would basically amount to writing a new ARM core.
@DirtyHairy Please have a look at 3c188fbb. Does that fit?
I am simply counting fetches, reads and writes.
👍 That's even more detailed than what I had in mind. We can take the sum of those, divide by the ARM clock and make the cart put NOP on the bus until the corresponding time has elapsed in the timeframe of the VCS. This will not be exact (there's caching and prefetching) , but we can multiply by a factor if it is off too for.
IIRC Harmony runs at 70 MHz and Encore at 60 MHz. So probably we should assume 60 MHz.
IIRC Harmony runs at 70 MHz and Encore at 60 MHz. So probably we should assume 60 MHz.
Well, you don't want things running overtime either. Whatever the default, how about a switch/option to change to the other and/or put it in the game's profile settings?
If we timing would be exact enough, then yes. But first we would have to find out how precise we can become.
We should no suggest precision when there is none.
I don't think we will hit 10% accuracy, so the point is probably moot.
BTW: This is somewhat duplicate to #757.
Yes, basically the same thing. Counting instructions/cycles is how we would approximate how long the ARM CPU is taking.
Then we should IMO mark this as duplicate and continue discussing there.
Fine with me.
See #757
I'm working on an engine that does some fairly hefty ARM calculations in the VB and overscan. So much so that I've had to split the ARM processing into smaller "chunks" which are processed via a scheduler. I've run into an issue where my engine runs TOTALLY differently on Stella than on real hardware. The problem is because, I understand, Stella seems to not have any real concept of ARM processing time.
In particular, I have 6507 code which calls a "chunk" of ARM processing in the timer loops in overscan and VB. Specifically, I check to see how much INTIM is available, and if (still) > some threshold (say, ~150 CPU cycles), then I do another ARM call. Those ARM calls are designed to take roughly 60-100 CPU cycles, so I can typically get a few of them in in the spare time at the end of VB and overscan after all the "must-do-every-frame" stuff has completed.
Now this all works fine on hardware. But on Stella -- the ARM code returns "instantly", so to speak, so the Stella version ends up running MANY more of the "ARM chunks" in any particular frame -- which of course makes the engine behave totally differently/incorrectly. I understand the reason why things are as they are.
I think this situation is a good example of why it would be good for Stella to somehow incorporate some "understanding" of how long ARM code is taking. It's done correctly in Gopher2600 and my engine runs the same on that, as on hardware (but slower, of course!). This is not a showstopper for me; I can fudge things so Stella sort of behaves -- but it's a definite emulation difference that is going to show up on any binaries I release. That is, Stella will be incorrect in emulating those. I'm hoping that some consideration might be given to improving this.