Closed lacker closed 2 years ago
At least some sample ATA data is 286 x 50331648 . I think we can avoid allocating extra GPU memory and just make the kernel accept a set size which is larger than num_timesteps and treat an out-of-bounds access as being a zero.
seems like it's working now, as of
https://github.com/lacker/seticore/commit/4694d894c5a45768762c327ed21d93c9adfb90b9
At least some sample ATA data is 286 x 50331648 . I think we can avoid allocating extra GPU memory and just make the kernel accept a set size which is larger than num_timesteps and treat an out-of-bounds access as being a zero.