Inconsistencies between Tardis versions

tardis-sn / tardis

TARDIS - Temperature And Radiative Diffusion In Supernovae

https://tardis-sn.github.io/tardis

202 stars 405 forks source link

Inconsistencies between Tardis versions #455

Closed unoebauer closed 8 years ago

unoebauer commented 8 years ago

Main description:

TARDIS produces different output between 45eca24d77eacc91d0a0b875e5521bc2d3dc151a and master with this YML file. Why? and what commit introduced this change.

The following YML files provide a minimum "working" example": issue_455.zip

The diagnosis of the problem at hand is still work in progress, but it seems the most recent Tardis version, c9e1b1761a6745d44bae973d962e1be180295827, is inconsistent with a previous state of Tardis.

All this has been triggered by re-examining some detailed runs performed during the Abundance Tomography Project of Talytha. For that, we mainly used a Tardis version similar to aa7da173d0e7339bc962bafb62d34d60596fd951 (a working Open-MP implementation has just been incorporated, PR #338). Comparing the virtual spectra obtained with the recent Tardis version for the same setup, reveals huge discrepancies:

tardis_cmp

The new Tardis version finds a much higher inner temperature. This goes hand in hand with a much higher radiation temperature in the ejecta and a higher ionisation state (and in turn electron density).

ionisation_cmp

Setup:

tardis_00173_2.yml

atom_data: kurucz_cd23_chianti_H_He.h5
model:
  abundances:
    filename: abundances_00173_2.dat
    filetype: simple_ascii
    type: file
  structure:
    filename: densities_00173_2.dat
    filetype: simple_ascii
    type: file
    v_inner_boundary: 12000.000 km/s
    v_outer_boundary: 35000.000 km/s
montecarlo:
  black_body_sampling:
    num: 1000000
    start: 1 angstrom
    stop: 1000000 angstrom
  convergence_criteria:
    damping_constant: 1.0
    fraction: 0.8
    hold: 3
    t_inner:
      damping_constant: 1.0
    threshold: 0.05
    type: specific
  iterations: 20
  last_no_of_packets: 500000
  no_of_packets: 50000
  no_of_virtual_packets: 3
  nthreads: 16
  seed: 23111963
plasma:
  disable_electron_scattering: false
  excitation: dilute-lte
  ionization: nebular
  line_interaction_type: downbranch
  radiative_rates_type: dilute-blackbody
spectrum:
  num: 10000
  start: 500 angstrom
  stop: 20000 angstrom
supernova:
  luminosity_requested: 8.675 log_lsun
  luminosity_wavelength_end: 6500.000 angstrom
  luminosity_wavelength_start: 3500.000 angstrom
  time_explosion: 6.800 day
tardis_config_version: v1.0

densities_00173_2.dat

6.800000 day
#Index     Velocities [km/s] Densities [g/cm^3]
  0          1.2000e+04          9.652102e-13
  1          1.2408e+04          3.659182e-13
  2          1.2844e+04          3.932164e-13
  3          1.3312e+04          3.900119e-13
  4          1.3816e+04          3.090009e-13
  5          1.4359e+04          3.108902e-13
  6          1.4947e+04          2.463674e-13
  7          1.5584e+04          5.468102e-14
  8          1.6279e+04          3.228778e-14
  9          1.7039e+04          2.211246e-14
 10          1.7872e+04          1.542880e-14
 11          1.8792e+04          1.264987e-14
 12          1.9811e+04          8.297541e-15
 13          2.0948e+04          4.517051e-15
 14          2.2222e+04          5.480575e-16
 15          2.3662e+04          8.695515e-18
 16          2.5301e+04          4.354394e-18
 17          2.7184e+04          2.150320e-18
 18          2.9371e+04          1.040788e-18
 19          3.1939e+04          4.580299e-19
 20          3.5000e+04          1.893468e-19

abundances_00173_2.dat

#Index Z=1 - Z=32
 0 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 1 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 2 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 3 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 4 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 5 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 6 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 7 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 2.7585e-02 6.9280e-06 8.8141e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.3220e-02 5.5630e-07 2.3687e-02 5.8250e-08 5.1738e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 2.7997e-07 3.1720e-09 1.3517e-05 1.0820e-07 5.2687e-04 1.2043e-02 2.3331e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 8 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 9 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
10 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
11 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
12 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
13 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
14 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
15 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
16 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
17 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
18 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
19 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
20 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00

unoebauer commented 8 years ago

Tardis version aa7da173d0e7339bc962bafb62d34d60596fd951 produces compatible results: tardis_cmp

unoebauer commented 8 years ago

Unfortunately, this discrepancy also affects the simpler LTE treatment:

tardis_lte_cmp

unoebauer commented 8 years ago

Lacking better ideas, I'll look through the commit history and will try to pin-point it to a specific change. Any other insights into the problem, @aoifeboyle, @wkerzendorf, @ssim, @orbitfold?

unoebauer commented 8 years ago

Status of commit-bisection (works implies no discrepancies between :

aa7da173d0e7339bc962bafb62d34d60596fd951: just after PR #338: works fine
4d377c8941ebf9e2e35b21f8d9113cbad47db463: just after PR #357: works fine
45eca24d77eacc91d0a0b875e5521bc2d3dc151a: just after PR #360: works fine
ab9433709444c34ddb3c234c19925ef329c389e4: just after PR #362: crashes before finishing one iteration - see Note 1
d864eb368efe57dbd6f4514b7aaba60bce57e9ee: just after PR #365: fails, zeta-crash
617a7f0455cacc1f2403370b03e5c5d1e14a47ab: just after PR #369: fails, zeta-crash
c9e1b1761a6745d44bae973d962e1be180295827: current version: fails

still bisecting...

Note the following when reading through the above list:

works: no difference between virtual spectrum calculated with that version and the original calculation above the Monte Carlo noise level
fails: clear differences between the spectra which cannot be attributed to Monte Carlo noise
zeta-crash: calculation crashes before reaching the final spectral run because the radiative temperature is higher than the values in the table for the zeta values - see Note 2

Note 1:

---------------------------------------------------------------------------
PlasmaMissingModule                       Traceback (most recent call last)
<ipython-input-2-6282305a6c09> in <module>()
----> 1 mdl = tardis.run_tardis(config)

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/base.pyc in run_tardis(config, atom_data)
     38     tardis_config = config_reader.Configuration.from_config_dict(
     39         config_dict, atom_data=atom_data)
---> 40     radial1d_mdl = model.Radial1DModel(tardis_config)
     41 
     42     simulation.run_radial1d(radial1d_mdl)

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/model.pyc in __init__(self, tardis_config)
    134                                                          excitation_mode=tardis_config.plasma.excitation,
    135                                                          line_interaction_type=tardis_config.plasma.line_interaction_type,
--> 136                                                          link_t_rad_t_electron=0.9)
    137 
    138         self.spectrum = TARDISSpectrum(tardis_config.spectrum.frequency, tardis_config.supernova.distance)

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/standard_plasmas.pyc in __init__(self, number_densities, atomic_data, time_explosion, t_rad, delta_treatment, nlte_config, ionization_mode, excitation_mode, line_interaction_type, link_t_rad_t_electron)
    103             delta_input=delta_treatment, nlte_species=nlte_config.species,
    104             previous_electron_densities=initial_electron_densities,
--> 105             previous_beta_sobolevs=initial_beta_sobolevs)

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/base.pyc in __init__(self, plasma_properties, **kwargs)
     18         self.plasma_properties = self._init_properties(plasma_properties,
     19                                                        **kwargs)
---> 20         self._build_graph()
     21 #        self.write_to_tex('Plasma_Graph', 'Plasma_Formulae')
     22         self.update(**kwargs)

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/base.pyc in _build_graph(self)
     79                                               '{1} which has not been added'
     80                                               ' to this plasma'.format(
---> 81                         plasma_property.name, input))
     82                 try:
     83                     position = self.outputs_dict[input].outputs.index(input)

PlasmaMissingModule: Module PhiSahaNebular requires input t_electron which has not been added to this plasma

Note 2:

ValueError                                Traceback (most recent call last)
<ipython-input-2-6282305a6c09> in <module>()
----> 1 mdl = tardis.run_tardis(config)

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/base.pyc in run_tardis(config, atom_data)
     40     radial1d_mdl = model.Radial1DModel(tardis_config)
     41 
---> 42     simulation.run_radial1d(radial1d_mdl)
     43 
     44     return radial1d_mdl

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/simulation.pyc in run_radial1d(radial1d_model, history_fname)
     25         logger.info('Remaining run %d', radial1d_model.iterations_remaining)
     26         radial1d_model.simulate(update_radiation_field=update_radiation_field, enable_virtual=False, initialize_nlte=initialize_nlte,
---> 27                                 initialize_j_blues=initialize_j_blues)
     28         initialize_j_blues=False
     29         initialize_nlte=False

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/model.pyc in simulate(self, update_radiation_field, enable_virtual, initialize_j_blues, initialize_nlte)
    333 
    334         self.calculate_j_blues(init_detailed_j_blues=initialize_j_blues)
--> 335         self.update_plasmas(initialize_nlte=initialize_nlte)
    336 
    337 

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/model.pyc in update_plasmas(self, initialize_nlte)
    237 
    238         self.plasma_array.update_radiationfield(self.t_rads.value, self.ws, self.j_blues,
--> 239             self.tardis_config.plasma.nlte, initialize_nlte=initialize_nlte, n_e_convergence_threshold=0.05)
    240 
    241         if self.tardis_config.plasma.line_interaction_type in ('downbranch', 'macroatom'):

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/standard_plasmas.pyc in update_radiationfield(self, t_rad, ws, j_blues, nlte_config, t_electrons, n_e_convergence_threshold, initialize_nlte)
     50         if nlte_config.species:
     51             self.store_previous_properties()
---> 52         self.update(t_rad=t_rad, w=ws, j_blues=j_blues)
     53 
     54     def __init__(self, number_densities, atomic_data, time_explosion,

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/base.pyc in update(self, **kwargs)
    141 
    142         for module_name in self._resolve_update_list(kwargs.keys()):
--> 143             self.plasma_properties_dict[module_name].update()
    144 
    145     def _update_module_type_str(self):

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/properties/base.pyc in update(self)
     84         if len(self.outputs) == 1:
     85             setattr(self, self.outputs[0], self.calculate(
---> 86                 *self._get_input_values()))
     87         else:
     88             new_values = self.calculate(*self._get_input_values())

/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/properties/ion_population.pyc in calculate(self, general_phi, t_rad, w, zeta_data, t_electrons, delta)
     65                              '- requested {2}'.format(
     66                 zeta_data.columns.values.min(), zeta_data.columns.values.max(),
---> 67                 t_rad))
     68         phis = general_phi * delta * w * (zeta + w * (1 - zeta)) * \
     69                (t_electrons/t_rad) ** .5

ValueError: t_rads outside of zeta factor interpolation zeta_min=2000.00 zeta_max=40000.00 - requested [ 24829.47853128  29255.41589326  33609.08295046  37345.62521055
  40019.28004965  40953.81576507  38383.82271919  34434.31244872
  32665.86581665  30041.40250129  27808.87307025  25354.81563128
  23264.0689412   21349.97604201  20834.53491125  20280.93368913
  20109.54322745  19707.24624127  19069.69093371  18789.89685196]

aoifeboyle commented 8 years ago

@unoebauer I have no idea. Pinpointing the commit that first created the problem is probably the best thing to do for now. I'm surprised Travis didn't catch this?

unoebauer commented 8 years ago

@aoifeboyle: no, unfortunately, Travis did never catch this. However, the problem considered here is much harder than our final spectral test...

aoifeboyle commented 8 years ago

@unoebauer Ah I see. We should try to create some kind of faster test to determine if the code is going wrong without having to run 20 iterations with 50k packets, I think? So we need to find the simplest possible case where the divergence is observed.

unoebauer commented 8 years ago

@aoifeboyle: I think that I've pinpointed the regime in the commit history during which the problem was introduced - see list above.

It happens somewhere between PR #360 and PR #365:

PR #361 only added some LaTeX formula - harmless in my opinion
PR #364 is a very small change in the statistics/Dalek part of Tardis - not relevant for the problem at hand, I think This leaves PRs #362 and #365 - looks like the problem has been introduced there...

Any thoughts, @aoifeboyle?

I'll try to devise a less expensive test which triggers the same problem.

unoebauer commented 8 years ago

I've created a "low-resolution" version of the test problem:

tardis_test.yml

atom_data: kurucz_cd23_chianti_H_He.h5
model:
  abundances:
    filename: abundances_test.dat
    filetype: simple_ascii
    type: file
  structure:
    filename: densities_test.dat
    filetype: simple_ascii
    type: file
    v_inner_boundary: 12000.000 km/s
    v_outer_boundary: 35000.000 km/s
montecarlo:
  black_body_sampling:
    num: 1000000
    start: 1 angstrom
    stop: 1000000 angstrom
  convergence_criteria:
    damping_constant: 1.0
    fraction: 0.8
    hold: 3
    t_inner:
      damping_constant: 1.0
    threshold: 0.05
    type: specific
  iterations: 15
  last_no_of_packets: 20000
  no_of_packets: 10000
  no_of_virtual_packets: 3
  nthreads: 16
  seed: 23111963
plasma:
  disable_electron_scattering: false
  excitation: dilute-lte
  ionization: nebular
  line_interaction_type: downbranch
  radiative_rates_type: dilute-blackbody
spectrum:
  num: 10000
  start: 500 angstrom
  stop: 20000 angstrom
supernova:
  luminosity_requested: 8.675 log_lsun
  luminosity_wavelength_end: 6500.000 angstrom
  luminosity_wavelength_start: 3500.000 angstrom
  time_explosion: 6.800 day
tardis_config_version: v1.0

densities_test.dat

6.800000 day
#Index     Velocities [km/s] Densities [g/cm^3]
  0      1.2000e+04      9.652102e-13
  1      1.2844e+04      3.804829e-13
  2      1.3816e+04      3.465327e-13
  3      1.4947e+04      2.760799e-13
  4      1.6279e+04      4.252074e-14
  5      1.7872e+04      1.846216e-14
  6      1.9811e+04      1.025111e-14
  7      2.2222e+04      2.306574e-15
  8      2.5301e+04      6.245017e-18
  9      2.9371e+04      1.513200e-18
 10      3.5000e+04      3.003242e-19

abundances_test.dat

#Index Z=1 - Z=32
  0 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  1 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  2 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  3 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 6.9280e-06 6.2377e-01 5.0460e-09 1.2570e-05 2.9230e-07 6.0000e-02 5.5630e-07 1.8000e-01 5.8250e-08 3.0000e-02 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-06 3.1720e-09 1.6600e-04 1.0820e-07 5.0000e-03 5.2010e-02 4.6180e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  4 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 2.8896e-02 6.9280e-06 8.9366e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.1471e-02 5.5630e-07 1.6254e-02 5.8250e-08 3.9934e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 1.4488e-07 3.1720e-09 6.2670e-06 1.0820e-07 3.1418e-04 1.0143e-02 2.2244e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  5 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  6 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  7 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  8 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
  9 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00
 10 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 3.0000e-02 6.9280e-06 9.0397e-01 5.0460e-09 1.2570e-05 2.9230e-07 2.0000e-02 5.5630e-07 1.0000e-02 5.8250e-08 3.0000e-03 8.2020e-08 7.3410e-07 3.0650e-08 3.0000e-03 4.6460e-10 3.1210e-08 3.1720e-09 1.6600e-07 1.0820e-07 1.3520e-04 8.5440e-03 2.1330e-02 7.2000e-09 1.7370e-08 0.0000e+00 0.0000e+00

It triggers the same problem as described above, but completes 10 times faster than the original one.

tardis_cmp

@aoifeboyle does this help?

aoifeboyle commented 8 years ago

Yes it should do! Was at a conference today but I have time to investigate now. I'll keep you updated.

ssim commented 8 years ago

this all sounds rather worrying! do we know if this is specific to using readin files for density/composition, or does a problem manifest even for simple input models?

unoebauer commented 8 years ago

@ssim: currently trying to isolate the problem. The tardis_example problem doesn't seem to be affected. tardis_cmp

ssim commented 8 years ago

Ok - perhaps we can take a look in Wurzburg?

In the meantime, good luck Sherlock!

unoebauer commented 8 years ago

Issue is not related to OMP: I get the same results when running the tardis_test setup with 16threads or serially, without activating OMP.

unoebauer commented 8 years ago

Maybe some progress towards narrowing down the exact cause of the problem:

Three more test setups, derived from tardis_test.yml (detailed setup files will be provided in next comment):

tardis_test_unifab:
- replace stratified abundances structure by uniform composition (reflecting the overall composition of the original test problem)
- Strong differences persist

tardis_cmp

tardis_test_unifab_branch:
- same as tardis_test_unifab but now also with branch85_w7 density profile (i.e. no structure files are used)
- Differences are weaker

tardis_cmp

tardis_test_unifab_branch_late:
- same as tardis_test_unifab_branch but with a longer time since explosion
- Differences are very weak - results almost identical

tardis_cmp

aoifeboyle commented 8 years ago

@unoebauer That's interesting. I was working through the yaml file and changing parameters to see what eliminated the problem, but looks like you beat me to it. Can we safely say the issue is related to the structure file then? Or do you think there's something else involved?

unoebauer commented 8 years ago

@aoifeboyle: not completely sure that it's the structure file alone. I have the suspicion that it is related to high densities...but I need to confirm that. I'll update you as soon as I have the results.

unoebauer commented 8 years ago

Here are the yaml files for the different runs:

tardis_test_unifab.yml

atom_data: kurucz_cd23_chianti_H_He.h5
model:
  abundances:
    type: uniform
    C: 0.01899
    N: 0.00001
    O: 0.80114
    Ne: 0.00001
    Mg: 0.03468
    Si: 0.07239
    S: 0.01291
    Ca: 0.00300
    Cr: 0.00006
    Fe: 0.00192
    Co: 0.02450
    Ni: 0.03045
  structure:
    filename: densities_test.dat
    filetype: simple_ascii
    type: file
    v_inner_boundary: 12000.000 km/s
    v_outer_boundary: 35000.000 km/s
montecarlo:
  black_body_sampling:
    num: 1000000
    start: 1 angstrom
    stop: 1000000 angstrom
  convergence_criteria:
    damping_constant: 1.0
    fraction: 0.8
    hold: 3
    t_inner:
      damping_constant: 1.0
    threshold: 0.05
    type: specific
  iterations: 15
  last_no_of_packets: 20000
  no_of_packets: 10000
  no_of_virtual_packets: 3
  nthreads: 16
  seed: 23111963
plasma:
  disable_electron_scattering: false
  excitation: dilute-lte
  ionization: nebular
  line_interaction_type: downbranch
  radiative_rates_type: dilute-blackbody
spectrum:
  num: 10000
  start: 500 angstrom
  stop: 20000 angstrom
supernova:
  luminosity_requested: 8.675 log_lsun
  luminosity_wavelength_end: 6500.000 angstrom
  luminosity_wavelength_start: 3500.000 angstrom
  time_explosion: 6.800 day
tardis_config_version: v1.0

tardis_test_unifab_branch.yml

atom_data: kurucz_cd23_chianti_H_He.h5
model:
  abundances:
    type: uniform
    C: 0.01899
    N: 0.00001
    O: 0.80114
    Ne: 0.00001
    Mg: 0.03468
    Si: 0.07239
    S: 0.01291
    Ca: 0.00300
    Cr: 0.00006
    Fe: 0.00192
    Co: 0.02450
    Ni: 0.03045
  structure:
    type: specific
    velocity:
      start: 12000.000 km/s
      stop: 35000.000 km/s
      num: 10
    density:
      type: branch85_w7
montecarlo:
  black_body_sampling:
    num: 1000000
    start: 1 angstrom
    stop: 1000000 angstrom
  convergence_criteria:
    damping_constant: 1.0
    fraction: 0.8
    hold: 3
    t_inner:
      damping_constant: 1.0
    threshold: 0.05
    type: specific
  iterations: 15
  last_no_of_packets: 20000
  no_of_packets: 10000
  no_of_virtual_packets: 3
  nthreads: 16
  seed: 23111963
plasma:
  disable_electron_scattering: false
  excitation: dilute-lte
  ionization: nebular
  line_interaction_type: downbranch
  radiative_rates_type: dilute-blackbody
spectrum:
  num: 10000
  start: 500 angstrom
  stop: 20000 angstrom
supernova:
  luminosity_requested: 8.675 log_lsun
  luminosity_wavelength_end: 6500.000 angstrom
  luminosity_wavelength_start: 3500.000 angstrom
  time_explosion: 6.800 day
tardis_config_version: v1.0

tardis_test_unifab_branch_late.yml

atom_data: kurucz_cd23_chianti_H_He.h5
model:
  abundances:
    type: uniform
    C: 0.01899
    N: 0.00001
    O: 0.80114
    Ne: 0.00001
    Mg: 0.03468
    Si: 0.07239
    S: 0.01291
    Ca: 0.00300
    Cr: 0.00006
    Fe: 0.00192
    Co: 0.02450
    Ni: 0.03045
  structure:
    type: specific
    velocity:
      start: 12000.000 km/s
      stop: 35000.000 km/s
      num: 10
    density:
      type: branch85_w7
montecarlo:
  black_body_sampling:
    num: 1000000
    start: 1 angstrom
    stop: 1000000 angstrom
  convergence_criteria:
    damping_constant: 1.0
    fraction: 0.8
    hold: 3
    t_inner:
      damping_constant: 1.0
    threshold: 0.05
    type: specific
  iterations: 15
  last_no_of_packets: 20000
  no_of_packets: 10000
  no_of_virtual_packets: 3
  nthreads: 16
  seed: 23111963
plasma:
  disable_electron_scattering: false
  excitation: dilute-lte
  ionization: nebular
  line_interaction_type: downbranch
  radiative_rates_type: dilute-blackbody
spectrum:
  num: 10000
  start: 500 angstrom
  stop: 20000 angstrom
supernova:
  luminosity_requested: 8.675 log_lsun
  luminosity_wavelength_end: 6500.000 angstrom
  luminosity_wavelength_start: 3500.000 angstrom
  time_explosion: 13.0 day
tardis_config_version: v1.0

unoebauer commented 8 years ago

@aoifeboyle, @ssim: From the comparison calculations I've performed, it looks like very high densities are triggering the problem. The branch85_w7 has lower densities in the innermost cells compared to what has been set up in densities_test.dat. When choosing a longer time since explosion (homologous expansion leads to lower densities), the differences are very weak, comparable to the MC noise level.

This could also explain, why Travis didn't catch this problem and why it doesn't show up in the tardis_example setup: we typically use times since explosions around 15d. The resulting densities are comparable to the last test run.

aoifeboyle commented 8 years ago

@unoebauer I find that the differences are small when using the original yaml file, but with scatter instead of downbranch. This is a comparison for commit 45eca24 (blue) vs. the current version (green). scatter_comparison

unoebauer commented 8 years ago

@aoifeboyle: Very interesting! - maybe that's an indication that something is wrong with the downbranch scheme....How does it look if you use macroatom instead?

aoifeboyle commented 8 years ago

The issue also occurs when macroatom is used. That narrows the issue down a lot. So something to do with high densities and downbranch/macroatom.

unoebauer commented 8 years ago

Very interesting, @aoifeboyle. It may not be so simple though...I've tried to trigger the issue with the tardis_example setup as well. I've changed a few parameters to mimic the tardis_test_unifab_branch setup, but so far the problem didn't show up (in all of the runs macroatom has been used):

only changing the time to explosion to 6days: no differences
time since explosion 6days and lower luminosity, 8.6 log Lsun: no difference
time since explosion 6days, 8.6 log Lsun, wavelength cut for requested luminosity 3500 - 6000A: no difference

Update

No difference always refers to "above the noise level" when comparing the current version with commit 45eca24. The only test in which the differences might be slightly larger than the noise level is the second one...

aoifeboyle commented 8 years ago

I've narrowed the issue down to be something related to the j_blues values. Hoping to find the exact issue soon.

unoebauer commented 8 years ago

@aoifeboyle: Thanks - hopefully that's it. We will discuss this issue today during the workshop in Wuerzburg.

aoifeboyle commented 8 years ago

@unoebauer I've been trying to work this out all day today and I'm not getting very far. I think it might be something to do with the montecarlo stuff rather than the plasma. I fixed the zeta problem in d864eb3, since it's the first one that fails, and I've been comparing it with 45eca24, the last one that works. The two plasmas seem identical after they are first initialised and first updated, and then after the first montecarlo runs the returned packet nus average out to a different value. For some reason the montecarlo runner is included in 45eca24 but not in d864eb3. I'll try to make some more progress tomorrow. @wkerzendorf do you have any ideas?

unoebauer commented 8 years ago

@aoifeboyle Thanks for the detailed analysis. This seems to be consistent with some of my findings: as a test I looked at two different setups (always comparing current version with 45eca24):

fix initial black body temperature and radiation temperature to 10000 K and perform only one ionisation iteration and the final spectral run. Only having one plasma update seems to be enough to introduce significant differences in the final spectra
same as above, now with no ionisation iterations but only the spectral calculation: the spectra are almost identical - only in two line features the differences may be above the noise level

I think that these two tests point into the same direction, namely that something goes wrong when updating the plasma. In my opinion there are two possibilities:

something got screwed up in C part of the Monte Carlo routine and the nu averages and/or related determinations go wrong
something was changed in the convergence scheme, i.e. how to determine new values for T_R, T_BB and W from their old values and the radiation field state reconstructed from the Monte Carlo step

@wkerzendorf performed a git-bisect bug search and identified a commit in which only the C part was changed. I was sceptical at the beginning that we performed the correct search, but thinking about it in light of @aoifeboyle's findings and these tests, git-bisect may have been right. @wkerzendorf - can you elaborate on the git-bisect results?

After X-mas of course ;)

unoebauer commented 8 years ago

FYI @aoifeboyle, @wkerzendorf, @ssim: I've had a closer look:

UPDATE

_See my follow-up comment: discrepancies in tau_sobolevs are mostly likely not real. The virtual spectrum comparison plot and the printout of the nu_barestimators are still relevant, though

ORIGINAL POST

I reinvestigated the problem with the fixed plasma and boundary state, running only the spectral calculation (the yaml file is posted below). Comparing the results obtained with 45eca24 and the current branch revealed the following differences in the plasma state:

maximum relative difference in the electron densities: 0
maximum relative difference in the ion populations: ~1e-15
maximum relative difference in the level populations: ~1e-15
maximum relative difference in the tau sobolevs: ~1e+65

There are a number of line transitions (more accurately 13519), for which the relative difference in the tau Sobolevs is above 1e-10. Some of these lines are quite strong, as the following plot of the relative vs. the absolute difference in tau Sobolev shows:

00173_2_fix_tau_sob_diff_cmp

These differences could also explain the (small but visible) discrepancies in the virtual spectrum:

00173_2_fix_spectrum_cmp

It could be possible that these differences in the tau Sobolevs are the reason for the different nu_bar estimates after the Monte Carlo calculation. Having looked into the source code, I didn't spot any differences in the algorithm with which these estimates are calculated between the two Tardis versions. Moreover, the relative differences between the nubar_estimators are on the few per cent level:

45eca24

In [1]: config = yaml.safe_load(open("tardis_00173_2_fix.yml"))                                                                                                

In [2]: mdl_old = tardis.run_tardis(config)
tardis.io.config_reader - INFO - Reading Atomic Data from kurucz_cd23_chianti_H_He.h5
tardis.atomic - INFO - Read Atom Data with UUID=5ca3035ca8b311e3bb684437e69d75d7 and MD5=21095dd25faa1683f4c90c911a00c3f8
tardis.io.model_reader - WARNING - v_inner_boundary requested too small for readin file. Boundary shifted to match file.
tardis.io.model_reader - WARNING - v_outer_boundary requested too large for readin file. Boundary shifted to match file.
tardis.io.config_reader - WARNING - Abundances have not been normalized to 1. - normalizing
tardis.io.config_reader - WARNING - No "species" given - ignoring other NLTE options given:
{   'classical_nebular': False, 'coronal_approximation': False}
tardis.io.config_reader - WARNING - No convergence criteria selected - just damping by 0.5 for w, t_rad and t_inner
tardis.simulation - INFO - Doing last run
tardis.model - INFO - Calculating J_blues for radiative_rates_type=dilute-blackbody
tardis.packet_source - INFO - Calculating 500000 packets for t_inner=10000.00
Running with OpenMP - 16 threadstardis.simulation - INFO - Finished in 1 iterations and took 498.82 s

In [3]: mdl_old.runner.nu_bar_estimator
Out[3]: 
array([  5.86773400e+28,   5.02563144e+28,   4.34765138e+28,
         3.70558614e+28,   2.96558255e+28,   2.02856011e+28,
         1.37465171e+28,   1.25736437e+28,   1.19699029e+28,
         1.14731265e+28,   1.10559996e+28,   1.04484887e+28,
         9.92324586e+27,   9.82713255e+27,   1.04365404e+28,
         1.14142807e+28,   1.26513448e+28,   1.42094539e+28,
         1.61340125e+28,   1.85833208e+28])

current

In [1]: config = yaml.safe_load(open("tardis_00173_2_fix.yml"))

In [2]: mdl_new = tardis.run_tardis(config)
tardis.io.config_reader - INFO - Reading Atomic Data from kurucz_cd23_chianti_H_He.h5
tardis.atomic - INFO - Read Atom Data with UUID=5ca3035ca8b311e3bb684437e69d75d7 and MD5=21095dd25faa1683f4c90c911a00c3f8
tardis.io.model_reader - WARNING - v_inner_boundary requested too small for readin file. Boundary shifted to match file.
tardis.io.model_reader - WARNING - v_outer_boundary requested too large for readin file. Boundary shifted to match file.
tardis.io.config_reader - WARNING - Abundances have not been normalized to 1. - normalizing
tardis.io.config_reader - WARNING - No "species" given - ignoring other NLTE options given:
{   'classical_nebular': False, 'coronal_approximation': False}
tardis.io.config_reader - WARNING - No convergence criteria selected - just damping by 0.5 for w, t_rad and t_inner
tardis.plasma.properties.atomic - WARNING - Zeta_data missing - replaced with 1s. Missing ions: [(9, 10), (11, 12), (12, 13), (13, 14), (14, 15), (15, 16), (16, 17), (17, 18), (18, 19), (19, 20), (20, 21), (21, 22), (22, 23), (23, 24), (24, 25), (25, 26), (26, 27), (27, 28), (28, 29), (29, 1), (29, 2), (29, 3), (29, 4), (29, 5), (29, 6), (29, 7), (29, 8), (29, 9), (29, 10), (29, 11), (29, 12), (29, 13), (29, 14), (29, 15), (29, 16), (29, 17), (29, 18), (29, 19), (29, 20), (29, 21), (29, 22), (29, 23), (29, 24), (29, 25), (29, 26), (29, 27), (29, 28), (29, 29), (29, 30), (30, 1), (30, 2), (30, 3), (30, 4), (30, 5), (30, 6), (30, 7), (30, 8), (30, 9), (30, 10), (30, 11), (30, 12), (30, 13), (30, 14), (30, 15), (30, 16), (30, 17), (30, 18), (30, 19), (30, 20), (30, 21), (30, 22), (30, 23), (30, 24), (30, 25), (30, 26), (30, 27), (30, 28), (30, 29), (30, 30), (30, 31)]
tardis.model - INFO - Calculating J_blues for radiative_rates_type=dilute-blackbody
tardis.simulation.base - INFO - Doing last run
Running with OpenMP - 1 threads
tardis.simulation.base - INFO - Finished in 0 iterations and took 149.83 s

In [3]: mdl_new.runner.nu_bar_estimator
Out[3]: 
array([  6.01178210e+28,   5.13461157e+28,   4.43216973e+28,
         3.77940227e+28,   3.04304078e+28,   2.10994277e+28,
         1.43991981e+28,   1.31111096e+28,   1.22309477e+28,
         1.16231763e+28,   1.11305261e+28,   1.05664707e+28,
         1.00899515e+28,   1.00840642e+28,   1.07745764e+28,
         1.17769597e+28,   1.30562338e+28,   1.46607607e+28,
         1.66597070e+28,   1.91696842e+28])

What we see in the original setup (posted at the beginning of this issue) is a runaway process during many (20) ionization iterations.

Config file: tardis_00173_2_fix.yml

atom_data: kurucz_cd23_chianti_H_He.h5
model:
  abundances:
    filename: abundances_00173_2.dat
    filetype: simple_ascii
    type: file
  structure:
    filename: densities_00173_2.dat
    filetype: simple_ascii
    type: file
    v_inner_boundary: 12000.000 km/s
    v_outer_boundary: 35000.000 km/s
montecarlo:
  black_body_sampling:
    num: 1000000
    start: 1 angstrom
    stop: 1000000 angstrom
  convergence_criteria:
    damping_constant: 1.0
    fraction: 0.8
    hold: 3
    t_inner:
      damping_constant: 1.0
    threshold: 0.05
    type: specific
  iterations: 1
  last_no_of_packets: 500000
  no_of_packets: 50000
  no_of_virtual_packets: 3
  nthreads: 16
  seed: 23111963
plasma:
  initial_t_rad: 10000 K
  initial_t_inner: 10000 K
  disable_electron_scattering: false
  excitation: dilute-lte
  ionization: nebular
  line_interaction_type: downbranch
  radiative_rates_type: dilute-blackbody
spectrum:
  num: 10000
  start: 500 angstrom
  stop: 20000 angstrom
supernova:
  luminosity_requested: 8.675 log_lsun
  luminosity_wavelength_end: 6500.000 angstrom
  luminosity_wavelength_start: 3500.000 angstrom
  time_explosion: 6.800 day
tardis_config_version: v1.0

ssim commented 8 years ago

hi @unoebauer

...well that's certainly very interesting. I wonder how it can be that the level populations are consistent by the line optical depths are not. Is it possible to identify one example line where the optical depth changed a lot and figure out which of the sub-quantities in the tau formula were responsible for the change.

Given that the lines appear to change in different ways from each other (i.e. that they don't all change by a factor) then it seems unlikely that it's anything to do with the velocity gradient part of the expression. So contenders would seem to be (i) the level populations (but your test here suggests not?) or else (ii) the source atomic data reading (i.e. the A-values)...but that seems unlikely. Can you check what's happening on a zone by zone basis? I.e. is there a pattern if you look at a particular line in multiple zones (if the atomic data was off by some factor we might expect that the offset would be the same for a given line in every zone). If it's populations...hmmm. Can there be an issue with variable types or something?

unoebauer commented 8 years ago

@ssim, @aoifeboyle, @wkerzendorf:

Sorry for the confusion, but I have been fooled by the pandas Dataframes.

In the old Tardis version, the tau_sobolevs were not necessarily ordered according to the line_ids. Indexing patterns as shown below are frequently encountered in the tau_sobolev DataFrames in the old Tardis version

In [1]: config = yaml.safe_load(open("tardis_00173_2_fix.yml"))

In [2]: mdl_old = tardis.run_tardis(config)
tardis.io.config_reader - INFO - Reading Atomic Data from kurucz_cd23_chianti_H_He.h5
tardis.atomic - INFO - Read Atom Data with UUID=5ca3035ca8b311e3bb684437e69d75d7 and MD5=21095dd25faa1683f4c90c911a00c3f8
tardis.io.model_reader - WARNING - v_inner_boundary requested too small for readin file. Boundary shifted to match file.
tardis.io.model_reader - WARNING - v_outer_boundary requested too large for readin file. Boundary shifted to match file.
tardis.io.config_reader - WARNING - Abundances have not been normalized to 1. - normalizing
tardis.io.config_reader - WARNING - No "species" given - ignoring other NLTE options given:
{   'classical_nebular': False, 'coronal_approximation': False}
tardis.io.config_reader - WARNING - No convergence criteria selected - just damping by 0.5 for w, t_rad and t_inner
tardis.simulation - INFO - Doing last run
tardis.model - INFO - Calculating J_blues for radiative_rates_type=dilute-blackbody
tardis.packet_source - INFO - Calculating 500000 packets for t_inner=10000.00
Running with OpenMP - 16 threadstardis.simulation - INFO - Finished in 1 iterations and took 500.82 s

In [3]: mdl_old.plasma_array.tau_sobolevs.index[110:115]
Out[3]: Int64Index([163, 165, 167, 166, 168], dtype='int64')

In the current Tardis version, the tau_sobolevs are strictly ordered according to the line_ids:

In [1]: config = yaml.safe_load(open("tardis_00173_2_fix.yml"))

In [2]: mdl_new = tardis.run_tardis(config)
tardis.io.config_reader - INFO - Reading Atomic Data from kurucz_cd23_chianti_H_He.h5
tardis.atomic - INFO - Read Atom Data with UUID=5ca3035ca8b311e3bb684437e69d75d7 and MD5=21095dd25faa1683f4c90c911a00c3f8
tardis.io.model_reader - WARNING - v_inner_boundary requested too small for readin file. Boundary shifted to match file.
tardis.io.model_reader - WARNING - v_outer_boundary requested too large for readin file. Boundary shifted to match file.
tardis.io.config_reader - WARNING - Abundances have not been normalized to 1. - normalizing
tardis.io.config_reader - WARNING - No "species" given - ignoring other NLTE options given:
{   'classical_nebular': False, 'coronal_approximation': False}
tardis.io.config_reader - WARNING - No convergence criteria selected - just damping by 0.5 for w, t_rad and t_inner
tardis.plasma.properties.atomic - WARNING - Zeta_data missing - replaced with 1s. Missing ions: [(9, 10), (11, 12), (12, 13), (13, 14), (14, 15), (15, 16), (16, 17), (17, 18), (18, 19), (19, 20), (20, 21), (21, 22), (22, 23), (23, 24), (24, 25), (25, 26), (26, 27), (27, 28), (28, 29), (29, 1), (29, 2), (29, 3), (29, 4), (29, 5), (29, 6), (29, 7), (29, 8), (29, 9), (29, 10), (29, 11), (29, 12), (29, 13), (29, 14), (29, 15), (29, 16), (29, 17), (29, 18), (29, 19), (29, 20), (29, 21), (29, 22), (29, 23), (29, 24), (29, 25), (29, 26), (29, 27), (29, 28), (29, 29), (29, 30), (30, 1), (30, 2), (30, 3), (30, 4), (30, 5), (30, 6), (30, 7), (30, 8), (30, 9), (30, 10), (30, 11), (30, 12), (30, 13), (30, 14), (30, 15), (30, 16), (30, 17), (30, 18), (30, 19), (30, 20), (30, 21), (30, 22), (30, 23), (30, 24), (30, 25), (30, 26), (30, 27), (30, 28), (30, 29), (30, 30), (30, 31)]
tardis.model - INFO - Calculating J_blues for radiative_rates_type=dilute-blackbody
tardis.simulation.base - INFO - Doing last run
Running with OpenMP - 1 threads
tardis.simulation.base - INFO - Finished in 0 iterations and took 152.69 s

In [15]: mdl_new.plasma_array.outputs_dict['tau_sobolevs'].tau_sobolevs.index[110:115]
Out[15]: Int64Index([163, 165, 166, 167, 168], dtype='int64')

When comparing the plasma states of the two Tardis versions, I have not taking this difference in ordering into account. After doing so, the discrepancies go away:

In [1] data_old = pd.HDFStore("model_00173_2_fix_45eca24d77eacc91d0a0b875e5521bc2d3dc151a.h5")
In [2] data_new = pd.HDFStore("model_00173_2_fix_c9e1b1761a6745d44bae973d962e1be180295827.h5")
In [147]: tau_old = data_old["plasma_array/tau_sobolevs"].ix[data_new["plasma_array/tau_sobolevs"].index]
In [148]: tau_new = data_new["plasma_array/tau_sobolevs"]
In [150]: zero_mask = tau_old.values == 0
In [151]: tau_reldiff = ((tau_old - tau_new)  / tau_old).values
In [153]: tau_reldiff[zero_mask] = np.ones(tau_reldiff[zero_mask].shape[0]) * 1e-50
In [156]: np.fabs(tau_reldiff).max()
Out[156]: 4.2683148926623051e-14

Bottom line:

there seem to be no discrepancies in the tau_sobolevs, contrary to the statements in my previous comment
this means that the tau_sobolevs are (probably) not causing the overall problem

...the search continues...

unoebauer commented 8 years ago

The differences in the nu_bar_estimates are different is in my opinion still most promising lead. We should investigate this further...

aoifeboyle commented 8 years ago

@unoebauer @wkerzendorf I think I might know what's happening. It's actually something that was fixed before but then someone unfixed it apparently (could have been me but don't remember).

In the current master version, in tardis/plasma/properties/radiative_properties.py, lines 57-60 should be:

    n_lower = level_number_density.values.take(lines_lower_level_index,
        axis=0, mode='raise').copy('F')
    n_upper = level_number_density.values.take(lines_upper_level_index,
        axis=0, mode='raise').copy('F')

I tried modifying the current version to see if that fixes the problem, or even helps, but I get an error when I try to run the master version just as it is. Specifically ndarray is not C-contiguous. Do either of you know anything about this?

What I think is the same issue as our current problem was addressed in PR #319 (see my comment beginning 'Hi @kaushik94 and @wkerzendorf.'). Sounds similar, right? The change above corrected the issue the first time.

wkerzendorf commented 8 years ago

I don't know if this is the problem - but as you said @aoifeboyle we had trouble with the interface before. The current interface uses cython array memoryviews where you can specify the contiguity. It will also complain if they array is wrong, I think. But maybe it's a good road to go down on.

aoifeboyle commented 8 years ago

@wkerzendorf @unoebauer Could one of you please check if you are able to run the current Tardis master version? We need to fix this error before we can make any further progress. If we're all getting this error, could you help @orbitfold? Since it's a C interface issue and I don't really understand it well.

orbitfold commented 8 years ago

@aoifeboyle Just so we're on the same page. Should I checkout master, make the changes you posted in the previous comment and see if I can fix the error?

wkerzendorf commented 8 years ago

@aoifeboyle @orbitfold I think we should sit down together this afternoon and discuss this. This will be quicker than writing. How does 4pm CET sound?

orbitfold commented 8 years ago

Works for me.

On Thu, Jan 7, 2016 at 12:52 PM, Wolfgang Kerzendorf < notifications@github.com> wrote:

@aoifeboyle https://github.com/aoifeboyle @orbitfold https://github.com/orbitfold I think we should sit down together this afternoon and discuss this. This will be quicker than writing. How does 4pm CET sound?

— Reply to this email directly or view it on GitHub https://github.com/tardis-sn/tardis/issues/455#issuecomment-169628144.

"Cheshire-Puss," she began, "would you tell me, please, which way I ought to go from here?" "That depends a good deal on where you want to get to," said the Cat. "I don't care much where--" said Alice. "Then it doesn't matter which way you go," said the Cat.

aoifeboyle commented 8 years ago

issue_455.zip

wkerzendorf commented 8 years ago

Here are my git bisect scripts:

plasma_test_bisect.sh

#/bin/sh
git clean -dfx -e.idea
ASTROPY_USE_SYSTEM_PYTEST=1 /Users/wkerzend/anaconda3/envs/tardis-devel/bin/python setup.py develop
cd ~/projects/tardis/plasma_fail/
/Users/wkerzend/anaconda3/envs/tardis-devel/bin/python plasma_test.py

plasma_test.py:

from tardis import run_tardis
import numpy as np
import sys
import subprocess
from matplotlib import pylab as plt

label = subprocess.check_output(["git", "--git-dir", "/Users/wkerzend/python/tardis/.git", "describe"]).strip()

try:
    mdl = run_tardis('plasma_fail_test.yml')
except KeyboardInterrupt:
    raise
except:
    sys.exit(125)

rw, rf = np.loadtxt('plasma_fail_refspec.dat', unpack=1)
diff = np.sum((rf - mdl.spectrum_virtual.luminosity_density_lambda.value)**2)

plt.plot(rw, rf, label='reference')
plt.plot(mdl.spectrum_virtual.wavelength, mdl.spectrum_virtual.luminosity_density_lambda, label='git_ver label={0} difference={1}'.format(label.strip(), diff))
plt.legend()
plt.savefig('plot_{0}.pdf'.format(label.strip()))
if diff < 1e80:
    print "Difference is negligible", diff
    sys.exit(0)
else:
    print "Difference is big", diff
    sys.exit(1)

yeganer commented 8 years ago

I just tried to reproduce the setup and bug but i'm running into a problem.

45eca24 doesn't run for me.

It builds "fine" (except for a bunch of warnings about inlines declared but not defined) with setup.py develop but when i try to run it i get the following:

Traceback (most recent call last):
  File "/home/stefan/anaconda2/envs/tardis/bin/tardis", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/home/stefan/projects/tardis/data/plasma_fail/tardis/scripts/tardis", line 4, in <module>
    from tardis import simulation, model
  File "/home/stefan/projects/tardis/data/plasma_fail/tardis/tardis/model.py", line 14, in <module>
    from tardis.montecarlo import montecarlo
ImportError: /home/stefan/projects/tardis/data/plasma_fail/tardis/tardis/montecarlo/montecarlo.so: undefined symbol: rpacket_set_id

Am i perhaps using the wrong commit or am i missing something?

orbitfold commented 8 years ago

Of the top of my head - did you do git clean -dfx ? On 12 Jan 2016 12:30, "yeganer" notifications@github.com wrote:

I just tried to reproduce the setup and bug but i'm running into a problem.

45eca24 https://github.com/tardis-sn/tardis/commit/45eca24d77eacc91d0a0b875e5521bc2d3dc151a doesn't run for me.

It builds "fine" (except for a bunch of warnings about inlines declared but not defined) with setup.py develop but when i try to run it i get the following:

Traceback (most recent call last): File "/home/stefan/anaconda2/envs/tardis/bin/tardis", line 6, in exec(compile(open(file).read(), file, 'exec')) File "/home/stefan/projects/tardis/data/plasma_fail/tardis/scripts/tardis", line 4, in from tardis import simulation, model File "/home/stefan/projects/tardis/data/plasma_fail/tardis/tardis/model.py", line 14, in from tardis.montecarlo import montecarlo ImportError: /home/stefan/projects/tardis/data/plasma_fail/tardis/tardis/montecarlo/montecarlo.so: undefined symbol: rpacket_set_id

Am i perhaps using the wrong commit or am i missing something?

— Reply to this email directly or view it on GitHub https://github.com/tardis-sn/tardis/issues/455#issuecomment-170868038.

yeganer commented 8 years ago

Yes i did git clean -dfv. I also did a complete new clone with

git clone git://github.com/tardis-sn/tardis.git
git checkout 45eca24d77eacc91d0a0b875e5521bc2d3dc151a
./setup.py develop

Could that be a compiler issue?

orbitfold commented 8 years ago

C compiler/OS? On 12 Jan 2016 12:40, "yeganer" notifications@github.com wrote:

Yes i did git clean -dfv. I also did a complete new clone with

git clone git://github.com/tardis-sn/tardis.git git checkout 45eca24d77eacc91d0a0b875e5521bc2d3dc151a ./setup.py develop

— Reply to this email directly or view it on GitHub https://github.com/tardis-sn/tardis/issues/455#issuecomment-170870843.

yeganer commented 8 years ago

I'm using gcc 5.2.1 with Ubuntu 15.10

I looked into the problem and it seems rpacket.o doesn't contain any symbols. So every function defined as inline is skipped somehow.

UPDATE: i just tried gcc-4.9 and that worked. So it's indeed a compiler issue

Vytautas Jancauskas notifications@github.com schrieb am Di., 12. Jan. 2016 um 12:02 Uhr:

C compiler/OS? On 12 Jan 2016 12:40, "yeganer" notifications@github.com wrote:

Yes i did git clean -dfv. I also did a complete new clone with

git clone git://github.com/tardis-sn/tardis.git git checkout 45eca24d77eacc91d0a0b875e5521bc2d3dc151a ./setup.py develop

— Reply to this email directly or view it on GitHub https://github.com/tardis-sn/tardis/issues/455#issuecomment-170870843.

— Reply to this email directly or view it on GitHub https://github.com/tardis-sn/tardis/issues/455#issuecomment-170876868.

yeganer commented 8 years ago

I ran an updated version of the bisect script that checked the nu_bar_estimators and the result was this:

The first bad commit could be any of:
45184e077085763d8f0113cb57684cc4d704c26b
2f2747dd946ac0dc9f7a0020d5c3e8a187189d68
3f88a317d92285601268fa52c157a0aa6288d97b
36c37afcd7e02325021c887911376cf8feb6f78e
dfd084ff361d6559848f3e37e95a121ad2d850c2
385203df334dca2b0640cd37301beed321becfb6
d2c22b0cdb0918d3d60e575214a46401223ccaf7
b094131931c887ad21e605fd090c44c10c59d22b
ccdff54dc32833467912596e06b34309d48c0e97
627fcf40c4430044c8b4bc41da606fa47db9463c
e9994fd821afc3c35dc40aef1b334390664bee62
e2eac6dddb91dd1b05ae4a9733e270abfa094404
f6b8c41048a0540f4aea82a15bef0e53741ccd64
ec8fcda6271bcf7c864d7ab04c31dfb2d21aed98
We cannot bisect more!
bisect run cannot continue any more

If i'm not mistaken these are the same as @wkerzendorf found in his bisect.

wkerzendorf commented 8 years ago

thanks @yeganer - well this at least helps! I wonder if we can convert this to a diff that we can look through.

yeganer commented 8 years ago

I'm working on a binary diff of the structure that goes into the MC simulation but that doesn't seem that easy because the storagemodel is mainly pointers.

yeganer commented 8 years ago

Update:

I mistakenly interpreted transition_probabilities_nd as array, so the values for 1 and greater are just random data.

Original Post:

I started doing some binary diffs of the storage model. The origin of the affected branch on master is 0dd77db. On that branch 30d1cb9 is the last working commit. Therefore i based my patches on that commit. I compared my results with 3fc862f although e2eac6d seemed to introduce the bug. My test setup used dilute-lte which wasn't implemented yet at that point.

As of now i tested r_inner, r_outer, v_inner, time_explosion and electron_densities. They were identical except for the electron densities where 2 values were off by ~1e-5.

When comparing the transition_propabilities_nd i noticed a huge difference between the two versions. transition_prob

Here is my patch together with the files i used to do my tests: patch.zip

git checkout 30d1cb9
git apply 0001-added-int.patch
./setup.py develop
cd $DATADIR
python plasma_test.py

wkerzendorf commented 8 years ago

good work!

chvogl commented 8 years ago

The transition_probabilities_nd are not used any more. Before changing the memory layout for the transition probabilities, this value was used for selecting transition probabilities for different shells. At the moment the value isn't assigned anymore, which explains the differences in the posted data. However, having a look at the storage structure is probably a good idea, since problems with the data passing have occured before.