Closed amaurea closed 11 months ago
I'll make pull requests for sotodlib and pwg-scripts that depend on this when this one is accepted.
I think my biggest concern is interface breakage -- it seems like this changes the shape of the RangesMatrix "threads" object, even in the case of interpol="nn". So existing sotodlib.coords code, if it inspected threads for some reasons, would probably choke. I am not sure whether any of the important code does so. Alas there don't seem to be any sotodlib unit tests that probe pmat, so that doesn't help us.
Yes, it does make a breaking change, but I went through all sotodlib code I could find that uses this, and updated it accordingly. Did I miss anything? Edit: Oh, right. I haven't made that pull request yet because it depends on this one. Chicken-and-egg.
I think this breaking change is worth it because the new structure is both general enough to handle basically any parallelization scheme while also having practically no overhead. I think it would be a bad idea to drag around two different shapes for the threads object depending on the interpolation used or the threading approach used. Maybe we can discuss this in a meeting if you disagree.
Edit: The old threads
object was 3d with shape [parallel_groups][ndet][nrange]
, and was used like this (pseudo-code):
omp_parallel_for each group in threads:
process_group(group)
The new one is 4d with shape [serial_groups][parallel_groups][ndet][nrange]
, and is used like this
for each bunch in threds:
omp_parallel_for each group in bunch:
process_group(group)
This allows for cases where not everything can be done in parallel at once, but there are still subsets of data that it is safe to do in parallel. This interface is very general, so it can support many different parallelization schemes, including the old one. However, as you say it is a breaking change in the interface. It would be possible to shoehorn this functionality into the old interface like this:
nbunch = ceil(len(threads)/nthread)
for i in range(nbunch):
omp_parallel_for each group in threads[i*nthread:(i+1)*nthread]:
process_group(group)
By using empty groups as necessary, this construction should support everything the 4d object can without needing any changes to the interface. However, I think it's much less clear what it does, so I think this could be error-prone in the future. What do you think?
Support for bilinear mapmaking. Standard nearest neighbor mapmaking is still the default, and its performance is unchanged in my tests.