Brainstorm on how to improve speed for high density/density rate

apizzuto commented 3 years ago

I believe there are a few ideas out there right now, including:

Making an analysis dependent cut on redshift (Ignacio)
For large densities, return a histogram instead of a list. This will result in a small loss of per-source precision (Alex)

mjlarson commented 3 years ago

The main issue with high densities is the sampling of sources. The base branch uses SciPy's UnivariateSpline for the inverse CDF sampling, but sets the parameters so that we're effectively just doing a linear interpolation between nearest neighbors. Even so, UnivariateSpline still evaluates over the full "spline", leading to unnecessary backend calculations. Switching to SciPy's interp1d instead allows just for local evaluation dropping the time required to sample a density of 1e-5 from 128 seconds (total firesong_simulation time of 174.3 seconds) to 17.3 seconds (total firesong_simulation time of 63.5 seconds).

We can also shave a few seconds more off of the InverseCDF sampling if we reduce the binning. After fixing #19, we can drop the binning for the redshifts from 10,000 to 100 and only introduce 0.06% change in redshift and 0.4% change in flux. This reduces the total firesong_simulation time down to around 60.4 seconds with a density of 1e-5.

Further speed ups will have to come elsewhere. In particular, the Lumi2Flux funtion takes up just under half of the processing time remaining for high redshifts.

mjlarson commented 3 years ago

I've rearranged the EnergyIntegral used in Lumi2Flux in order to reduce the number of calculations by ~2x. That should save us another 10 seconds or so for a density of 1e-5. It's part of the pull request in #20 now.

renereimann commented 3 years ago

The FluxPDF.py calculates the PDF in a given range of Luminosity. Once this is calculated you simply can generate random numbers based on the PDF / CDF. This is equivalent to the histogram idea mentioned by @mjlarson above. So its basically already implemented. ;)

tglauch commented 3 years ago

Yeah, I want to second what Rene said. That is the basic idea of the flux pdf. You integrate over the entire luminosity function and receive a source count distribution, i.e., a number of source per source flux interval. From this 1D spline you can then draw the sources. Results should be equivalent to sampling from the luminosity function directly and speed is basically independent of the density.

apizzuto commented 3 years ago

Closing this issue because Theo and Rene are completely right that FluxPDF is a perfect solution for this, and that script is now sped up and fixed

icecube / FIRESONG

Brainstorm on how to improve speed for high density/density rate #15