Open rainwoodman opened 6 years ago
I didn't make it for float32 yet - the FLOPS are not the biggest computation time (still mostly struggle with incoherent memory accesses) but it would be good if you really want to squeeze the memory. @manodeep was also interested in fixed point. I'm not sure if it would be better to do these as compile options, forks or even c++ templates, so open to suggestions.
Adding an upcast in python would at least fix the crash for now -- before any structural changes to the code or introducing templates.
Ok I merged in this to at least fix the bug.
Current the python API fails with 32 bit pos input during CTypes. It is desirable to avoid upcasting in memory tight simulations. Is this simply because the 32bit interface is not exposed?