lronaldo / cpctelera

Astonishingly fast Amstrad CPC game engine for C developers
http://lronaldo.github.io/cpctelera/
GNU Lesser General Public License v3.0
222 stars 53 forks source link

New optimized cpct_getScreenPtr.asm #138

Closed joaquinferrero closed 2 years ago

joaquinferrero commented 3 years ago

Optimization proposal for the cpct_getScreenPtr routine.

The optimization focuses on the fact that we have to do 3 shifts to the left of the value of the rows and another 3 shifts to the left of the value of the lines, so using a 16-bit register we can shift both values simultaneously.

The new method saves 6 bytes and 6 µs (24 CPU cycles).

Has been checked against all the cpctelera examples that use this routine (master branch), plus a test program for access to all bytes on the screen.

The entry or exit conditions have not been changed. The entry, exit and modified registers are the same.

nestornillo commented 2 years ago

Nice optimization on this function, clever use of 16bit sum! While I was trying to understand your code I found it can be optimized just a little more: The two "srl a" can be replaced with two "rrca". (The previous "and #F8" ensures that less significant bits are 0). That saves 2 bytes and 2 us, so timings would be:

;; Required memory: ;; C-bindings - 24 bytes ;; ASM-bindings - 20 bytes ;; ;; Time Measures: ;; (start code) ;; Case | microSecs (us) | CPU Cycles ;; ----------------------------------------- ;; Any | 45 | 180 ;; ----------------------------------------- ;; Asm saving | -13 | -52 ;; ----------------------------------------- ;; (end code)

(btw, notice that CPU Cycles are just microSecs * 4)

Also, the 'details' paragraph of function's documentation talks about 'lines inside a group' and 'groups of lines'. When you refer to them in your source comments you use 'lines' for 'groups of lines', and 'rows' for 'lines inside a group'. In my opinion, it would be easier to understand if you just swap 'lines' for 'rows' in source comments.

joaquinferrero commented 2 years ago

Thanks, @nestornillo. I will add your changes.

One thing with the subject of lines and rows ... In the Details section the "rows" are not named, only the term "lines" is used as a synonym for each of the "lines" pixel on the screen. No reference to "rows" (of characters or group of 8 "lines") at any time.

This should be stated in the style doc, but the decision to put "lines" in the comments is because I see that it mostly refers to screen pixel "lines". So I used the term "row" for grouping of 8 lines.

I'll go over the comments and add a line of explanation.

lronaldo commented 2 years ago

Thank you very much, @joaquinferrero. Your optimization is extremely valuable as this is a widely used function. It's a really impressive optimization. Congratulations on such a great achievement :)

Thanks also to @nestornillo for reviewing and adding another improvement. This new version of the function is going to improve many games thanks to both of you :)

I will add both of you to the credits at the top of the comments and will also do a review of the documentation to try to include a deeper explanation of everything to help readers understanding the code :)

You are awesome, guys :)