Accelerate/reduce memory requirements for response function calculations

Problem: A current major bottleneck is the RAM and time requirement for computing the LISA time-dependent response function for anisotropic cases, especially for performing simulations (which use a denser frequency grid than the analysis). The code as it stands uses np.einsum to perform tensor convolutions with extremely large arrays, holding the full response tensor in memory as it does so. This is very time-efficient, but can require >1 TB of RAM for cases of interest.

Possible solution: introduce new option for response function calculations that handles these convolutions via for loops (reducing the memory requirement, but increasing runtime), but use jax.numpy to GPU-accelerate the now-smaller serial calculations. Do so in a way that can be parallelized by the number of available GPUs.

sharanbngr / blip

Accelerate/reduce memory requirements for response function calculations #132