Replace the algorithm for counting leading zeros (in the case of no builtin function)
by one that searches the number in larger chunks first.
This can speed up the counting by up to 10x in the case of small numbers (lots of zeros).
Both tester as well as benchmark_branchfree complete successfully with the new approach. It has also been confirmed explicitly that the results of the old and the new approach agree for all numbers less than 10,000,000.
The process of searching larger chunks first before going bitwise can probably be extended even more to speed it up even further, but this way appears to be a good compromise between legibility and speed.
Replace the algorithm for counting leading zeros (in the case of no builtin function) by one that searches the number in larger chunks first. This can speed up the counting by up to 10x in the case of small numbers (lots of zeros).
Both tester as well as benchmark_branchfree complete successfully with the new approach. It has also been confirmed explicitly that the results of the old and the new approach agree for all numbers less than 10,000,000.
The process of searching larger chunks first before going bitwise can probably be extended even more to speed it up even further, but this way appears to be a good compromise between legibility and speed.