nicocvn / cppreg

A C++11 header-only library for MMIO registers
https://nicocvn.github.io/cppreg/
Other
60 stars 6 forks source link

Very minor Performance issue in comparison to CMSIS #5

Closed hak8or closed 6 years ago

hak8or commented 6 years ago

When working on #1 using the following code, I spotted the following;

Differences

Potential Cause

I feel the first and second difference are somewhat related because in CMSIS the compiler is informed that MODER and BSRR are related via offsets from a base pointer, while in CPPReg they look totally unrelated. This results in the compiler having to "rebuild" the address twice for cppreg, once from immediates for MODERand another from a hardcoded address stored in the .text section. Furthermore, it seems the two instruction difference is also due to the masking and applying the value

Solution

I view this as being caused by the architecture limitations of Cortex M and the design of cppreg.

Real World Implications

To be frank, this difference in assembly from a performance standpoint is small, very small. Ideally the reason for the discrepancy can be verified/found with cppreg adopting the smaller of the two. But, writing to a register is very rarely a bottle neck unless you are bit banging, in which case you should probably be starting to seriously consider assembly instead.

sendyne-nicocvn commented 6 years ago

This is interesting. I did a few modifications (see here).

The cppreg version is now only larger by one instruction compared to the CMSIS code. Here is a thought:

when we use the template form for the MODER write call this calls the regular write function. Technically because in such a case the value, the offset, and the mask are compile-time constants part of the write implementation could be simplified (this is done in the super_write of the modified code). That seems to be where the simplification occurs.

Two things:

  1. It will not be much work to implement such faster write when the template form is used and it seems it could bring some additional performance.
  2. Regarding the offset between registers we could probably implement an abstract peripheral type or rather a cluster of register types but that is a bit more work and revisions.
sendyne-nicocvn commented 6 years ago

So after careful checking it seems that the only difference in the implementation given in the previous comment is related to what @hak8or mentioned about registers between related through an offset.

I think this is great news because this means we obtain quasi-identical performance to a CMSIS implementation. We could create an issue for a register cluster implementation as mentioned above.

hak8or commented 6 years ago

Awesome, in that case:

  1. May be worthwhile to implement
  2. Would be fantastic, I will create the issue. It sounds like a decent bit of changes to the API though so maybe it will be worthwhile to clump together the various potential API changes before starting.

In that case, I say we close this after implementing your point 1 with me saying that the grouping will be done in #7.

sendyne-nicocvn commented 6 years ago

Agreed. I will however create an issue for the new access policies implementation (already available but needs to be merged).