Open wtbarnes opened 6 years ago
Replace linalg.solve with SVD as in the ionization equilibrium calculation. They are really the same equation.
An equivalent function is implemented in Dask so this could be potentially parallelized as well.
speeding things up would be very helpful. right now I am trying to get Version 9 of the database and ChiantiPy out. There are significant changes in the way autoionization states are handled
After some profiling, it seems the main bottleneck here is computing the effective collision strengths from the scups data, specifically the descaling all of the collision strengths.
Just to keep a record, here's the profile I've just run on level_populations
:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
244 @needs_dataset('elvlc', 'wgfa', 'scups')
245 @u.quantity_input
246 @profile
247 def level_populations(self,
248 density: u.cm**(-3),
249 include_protons=True) -> u.dimensionless_unscaled:
250 """
251 Compute energy level populations as a function of temperature and density
252
253 Parameters
254 ----------
255 density : `~astropy.units.Quantity`
256 include_protons : `bool`, optional
257 If True (default), include proton excitation and de-excitation rates
258 """
259 1 2326.0 2326.0 0.0 level = self._elvlc['level']
260 1 4734.0 4734.0 0.0 lower_level = self._scups['lower_level']
261 1 3802.0 3802.0 0.0 upper_level = self._scups['upper_level']
262 1 407.0 407.0 0.0 coeff_matrix = np.zeros(self.temperature.shape + (level.max(), level.max(),))/u.s
263
264 # Radiative decay out of current level
265 3 10938.0 3646.0 0.0 coeff_matrix[:, level-1, level-1] -= vectorize_where_sum(
266 2 18325.0 9162.5 0.0 self.transitions.upper_level, level, self.transitions.A.value) * self.transitions.A.unit
267 # Radiative decay into current level from upper levels
268 2 8834.0 4417.0 0.0 coeff_matrix[:, self.transitions.lower_level-1, self.transitions.upper_level-1] += (
269 1 5608.0 5608.0 0.0 self.transitions.A)
270
271 # Collisional--electrons
272 1 13480735.0 13480735.0 17.2 ex_rate_e = self.electron_collision_excitation_rate()
273 1 11718735.0 11718735.0 15.0 dex_rate_e = self.electron_collision_deexcitation_rate()
274 3 17306.0 5768.7 0.0 ex_diagonal_e = vectorize_where_sum(
275 2 15.0 7.5 0.0 lower_level, level, ex_rate_e.value.T, 0).T * ex_rate_e.unit
276 3 30929.0 10309.7 0.0 dex_diagonal_e = vectorize_where_sum(
277 2 14.0 7.0 0.0 upper_level, level, dex_rate_e.value.T, 0).T * dex_rate_e.unit
278 # Collisional--protons
279 1 1401.0 1401.0 0.0 if include_protons and self._psplups is not None:
280 1 14309.0 14309.0 0.0 lower_level_p = self._psplups['lower_level']
281 1 6536.0 6536.0 0.0 upper_level_p = self._psplups['upper_level']
282 1 52839740.0 52839740.0 67.5 pe_ratio = proton_electron_ratio(self.temperature, **self._dset_names)
283 1 156.0 156.0 0.0 proton_density = np.outer(pe_ratio, density)[:, :, np.newaxis]
284 1 19077.0 19077.0 0.0 ex_rate_p = self.proton_collision_excitation_rate()
285 1 31654.0 31654.0 0.0 dex_rate_p = self.proton_collision_deexcitation_rate()
286 3 5279.0 1759.7 0.0 ex_diagonal_p = vectorize_where_sum(
287 2 7.0 3.5 0.0 lower_level_p, level, ex_rate_p.value.T, 0).T * ex_rate_p.unit
288 3 5977.0 1992.3 0.0 dex_diagonal_p = vectorize_where_sum(
289 2 9.0 4.5 0.0 upper_level_p, level, dex_rate_p.value.T, 0).T * dex_rate_p.unit
290
291 # Populate density dependent terms and solve matrix equation for each density value
292 1 17.0 17.0 0.0 populations = np.zeros(self.temperature.shape + density.shape + (level.max(),))
293 2 40.0 20.0 0.0 for i, d in enumerate(density):
294 1 542.0 542.0 0.0 c_matrix = coeff_matrix.copy()
295 # Collisional excitation and de-excitation out of current state
296 1 375.0 375.0 0.0 c_matrix[:, level-1, level-1] -= d*(ex_diagonal_e + dex_diagonal_e)
297 # De-excitation from upper states
298 1 1169.0 1169.0 0.0 c_matrix[:, lower_level-1, upper_level-1] += d*dex_rate_e
299 # Excitation from lower states
300 1 1392.0 1392.0 0.0 c_matrix[:, upper_level-1, lower_level-1] += d*ex_rate_e
301 # Same processes as above, but for protons
302 1 966.0 966.0 0.0 if include_protons and self._psplups is not None:
303 1 25.0 25.0 0.0 d_p = proton_density[:, i, :]
304 1 418.0 418.0 0.0 c_matrix[:, level-1, level-1] -= d_p*(ex_diagonal_p + dex_diagonal_p)
305 1 250.0 250.0 0.0 c_matrix[:, lower_level_p-1, upper_level_p-1] += d_p * dex_rate_p
306 1 524.0 524.0 0.0 c_matrix[:, upper_level_p-1, lower_level_p-1] += d_p * ex_rate_p
307 # Invert matrix
308 1 59286.0 59286.0 0.1 val, vec = np.linalg.eig(c_matrix.value)
309 # Eigenvectors with eigenvalues closest to zero are the solutions to the homogeneous
310 # system of linear equations
311 # NOTE: Sometimes eigenvalues may have complex component due to numerical stability.
312 # We will take only the real component as our rate matrix is purely real
313 1 88.0 88.0 0.0 i_min = np.argmin(np.fabs(np.real(val)), axis=1)
314 1 561.0 561.0 0.0 pop = np.take(np.real(vec), i_min, axis=2)[range(vec.shape[0]), :, range(vec.shape[0])]
315 # NOTE: The eigenvectors can only be determined up to a sign so we must enforce
316 # positivity
317 1 18.0 18.0 0.0 np.fabs(pop, out=pop)
318 1 40.0 40.0 0.0 np.divide(pop, pop.sum(axis=1)[:, np.newaxis], out=pop)
319 1 7.0 7.0 0.0 populations[:, i, :] = pop
320
321 1 59.0 59.0 0.0 return u.Quantity(populations)
@dstansby which ion did you use when generating this profile?
Currently, the matrix equation is solved using np.linalg.eig
. We should look into using np.linalg.eigh
as the matrix of terms is symmetric and Hermitian and should only have real eigenvalues. This should help performance concerns and simplify the code a bit.
Another route worth investigating is seeing how sparse the matrices are, and if they're quite sparse using sparse matrix functions.
The level populations calculation is extremely computationally expensive and scales poorly with number of available transitions. This means that level populations for ions with very large number of transitions and available energy levels (e.g. Fe IX, XI) take a very long time to compute.
The level populations calculation is essentially needed for everything so poor performance here impacts many other calculations. Maybe something like numba or cython could be used to speedup this calculation or it could be parallelized using something like dask.