Will add the reordering code necessary to do Romein's work distribution strategy directly in GPU backend library. Python code was just a temporary placeholder, but it is proving to be the major bottleneck to the whole gpu pipeline. Reindexing loop and prefix scan has to be written in C++
Will add the reordering code necessary to do Romein's work distribution strategy directly in GPU backend library. Python code was just a temporary placeholder, but it is proving to be the major bottleneck to the whole gpu pipeline. Reindexing loop and prefix scan has to be written in C++