Closed unoebauer closed 8 years ago
@chvogl is right - I missed the _nd
part and thought it was transition_probabilities
. So can you make a branch that gets rid of the _nd
part, just to make sure that this doesn't interfere?
@wkerzendorf started a spreadsheet to keep track of differences between the storage_model before and after the bug.
https://docs.google.com/spreadsheets/d/1NuekA6XUBmAzHR4FksqQKoqAOzrh-RvH-pHsttVYXf8/edit#gid=0
I added my findings so far. From what i can tell the differences are very small. The next thing i'll test will be all the non-pointer variables.
I finished comparing the whole structure. For pointer variables i did a binary dump and binary diff, for the rest i dumped the value to a txt file.
Here are my findings:
The difference seems to be very small but it will add up with enough packets.
@yeganer: please use a slightly different version of your Tardis setup - it is important that initial_t_rad
and initial_t_inner
are specified! Otherwise, the spectral run will be performed with different radiation field temperatures (leading to a different ionisation and excitation state).
plasma_fail.yml
atom_data: kurucz_cd23_chianti_H_He.h5
model:
abundances:
filename: abundances_test.dat
filetype: simple_ascii
type: file
structure:
filename: densities_test.dat
filetype: simple_ascii
type: file
v_inner_boundary: 12000.000 km/s
v_outer_boundary: 35000.000 km/s
montecarlo:
black_body_sampling:
num: 1000000
start: 1 angstrom
stop: 1000000 angstrom
iterations: 1
last_no_of_packets: 1000
no_of_packets: 1000
no_of_virtual_packets: 3
nthreads: 16
seed: 23111963
plasma:
initial_t_rad: 10000 K
initial_t_inner: 10000 K
disable_electron_scattering: false
excitation: dilute-lte
ionization: nebular
line_interaction_type: downbranch
radiative_rates_type: dilute-blackbody
spectrum:
num: 10000
start: 500 angstrom
stop: 20000 angstrom
supernova:
luminosity_requested: 8.675 log_lsun
luminosity_wavelength_end: 6500.000 angstrom
luminosity_wavelength_start: 3500.000 angstrom
time_explosion: 6.800 day
tardis_config_version: v1.0
With the above mentioned very simple and fast setup, the discrepancy in the nubar_estimator
is still triggered:
45eca24d77eacc91d0a0b875e5521bc2d3dc151a
In [1]: cmid = "45eca24d77eacc91d0a0b875e5521bc2d3dc151a"
In [2]: config = yaml.safe_load(open("plasma_fail_test.yml"))
In [3]: mdl = tardis.run_tardis(config)
In [4]: mdl.t_inner
Out[4]: <Quantity 10000.0 K>
In [5]: mdl.t_rads
Out[5]:
<Quantity [ 10000., 10000., 10000., 10000., 10000., 10000., 10000.,
10000., 10000., 10000.] K>
In [6]: mdl.ws
Out[6]:
array([ 0.37078131, 0.28228755, 0.22442434, 0.18012933, 0.14428712,
0.11452347, 0.089518 , 0.06844578, 0.05075197, 0.03605191])
In [7]: mdl.nubar_estimators
Out[7]:
array([ 1.02599646e+29, 7.86534414e+28, 4.30500473e+28,
2.67837214e+28, 2.62695254e+28, 2.35099899e+28,
2.10005595e+28, 2.24428343e+28, 2.77821946e+28,
3.61284314e+28])
c9e1b1761a6745d44bae973d962e1be180295827
:In [1]: cmid = "c9e1b1761a6745d44bae973d962e1be180295827"
In [2]: config = yaml.safe_load(open("plasma_fail.yml"))
In [3]: mdl = tardis.run_tardis(config)
In [4]: mdl.t_rads
Out[4]:
<Quantity [ 10000., 10000., 10000., 10000., 10000., 10000., 10000.,
10000., 10000., 10000.] K>
In [6]: mdl.t_inner
Out[6]: <Quantity 10000.0 K>
In [7]: mdl.nubar_estimators
Out[7]:
array([ 1.04092620e+29, 9.18533300e+28, 5.19606217e+28,
2.72662469e+28, 2.20794346e+28, 1.95275733e+28,
2.16164868e+28, 2.41354256e+28, 2.99510107e+28,
3.89735116e+28])
In [8]: mdl.ws
Out[8]:
array([ 0.37078131, 0.28228755, 0.22442434, 0.18012933, 0.14428712,
0.11452347, 0.089518 , 0.06844578, 0.05075197, 0.03605191])
My conclusion is the bug appeared between 30d1cb9 (still gives the correct result) and ec8fcda (gives a different result than 45eca24 or aa7da17).
Quick progress update. I've been comparing 45eca24 (working version) and 41d8fcd (the commit where Wolfgang merged the plasma restructure branch, which was giving a wrong answer). I found a small error in the plasma, and after correcting it and comparing with Uli's test, the tau values and transition probabilities values between the two cases after the first iteration definitely agree, but the nubar estimators still come out wrong. I think there might be more than one issue at play here. One other thing I've found is that the j_blues in the old plasma were a multi-dimensional array, and in the new plasma structure they take the form of a pandas dataframe. This change of format might be causing some issues when being pa ssed into the montecarlo code. I will investigate further tomorrow.
@aoifeboyle that's great that you found something. Can you quickly make a PR for the bug, that might help to know which way to investigate in.
I did another bisect run and there seem to be several bugs indeed. when using
plasma:
ionization: lte
bisect reports e9fb7780d504f2e013081abd11710d817e44d079 as first bad commit. But using
plasma:
ionization: nebular
gives commits e2eac6d - ec8fcda as possible first bad commits.
Maybe one of the issues was fixed by @aoifeboyle
I didn't look into the case with ionization: lte
further but for ionization: nebular
i did several binary dumps of the data passed to the storage model and electron_density
, inverse_electron_density
, line_lists_tau_sobolevs
and transition_probabilities
have relative errors > 1e-5 with max errors >1
@wkerzendorf I accidentally deleted my comment from last night, could you please paste it into this conversation somewhere? I think you should have it as an email?
The bug that I found in the plasma/restructure commit is something that I had already corrected at a later time, but it was not corrected at the time of the merge of the plasma/restructure. So I think that explains what you found @yeganer. My aim for now is to get consistent results for our test between the last working version (45eca24) and the plasma restructure commit (41d8fcd). There are too many differences between 45eca24 and the current version to easily debug it. But once I get this right, it should either solve all the problems or at least considerably reduce the number of problems.
@aoifeboyle in my opinion it is better to compare 30d1cb9 with ec8fcda because that's the smalles difference in code where the bug appears.
I added your comment at the bottom of my comment, if that's okay
I am re-posting @aoifeboyle's originial comment:
The bug that I found in the plasma/restructure commit is something that I had already corrected at a later time, but it was not corrected at the time of the merge of the plasma/restructure. So I think that explains what you found. My aim for now is to get consistent results for our test between the last working version (45eca24) and the plasma restructure commit (41d8fcd). There are too many differences between 45eca24 and the current version to easily debug it. But once I get this right, it should either solve all the problems or at least considerably reduce the number of problems.
Also @yeganer, what I'm comparing is basically the same thing that you're suggesting. 30d1cb9 is effectively 45eca24 because the model was still calling the old plasma structure. And ec8fcda is when the nebular ionisation was added to the new plasma, but this was on the old plasma/restructure branch rather than on the main Tardis branch, so it's less confusing for me to work with 41d8fcd.
I think I'm making good progress anyway. The only discrepancy I can find in the plasma now is j_blues being an array in one version and a dataframe in the other, so I'm going to correct that.
@unoebauer @yeganer PR #462 is an updated version of 41d8fcd (the point of new plasma structure merging). It now produces the exact same plasma, transition probabilities, taus, etc. as 45eca24. However, the montecarlo code still returns the wrong nubar_estimators, even though the montecarlo code is the same and the inputs from the plasma (appear) the same. I think this is possibly a view/copy issue. @yeganer you mentioned before that you know how to examine the binary data being passed from the python to C. Could you checkout and perform the same analysis to compare 2e7a85f (the fix I uploaded) and 45eca24?
@aoifeboyle: I just looked at your PR #462. Unfortunately, it crashes when running the plasma_fail_test.yml
problem:
In [2]: config = yaml.safe_load(open("plasma_fail_test.yml"))
In [3]: mdl = tardis.run_tardis(config)
tardis.io.config_reader - INFO - Reading Atomic Data from kurucz_cd23_chianti_H_He.h5
tardis.atomic - INFO - Read Atom Data with UUID=5ca3035ca8b311e3bb684437e69d75d7 and MD5=21095dd25faa1683f4c90c911a00c3f8
tardis.io.model_reader - WARNING - v_inner_boundary requested too small for readin file. Boundary shifted to match file.
tardis.io.model_reader - WARNING - v_outer_boundary requested too large for readin file. Boundary shifted to match file.
tardis.io.config_reader - WARNING - Abundances have not been normalized to 1. - normalizing
tardis.io.config_reader - WARNING - No "species" given - ignoring other NLTE options given:
{ 'classical_nebular': False, 'coronal_approximation': False}
tardis.io.config_reader - WARNING - No convergence criteria selected - just damping by 0.5 for w, t_rad and t_inner
tardis.plasma.properties.atomic - WARNING - Zeta_data missing - replaced with 1s
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-6282305a6c09> in <module>()
----> 1 mdl = tardis.run_tardis(config)
/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/base.pyc in run_tardis(config, atom_data)
38 tardis_config = config_reader.Configuration.from_config_dict(
39 config_dict, atom_data=atom_data)
---> 40 radial1d_mdl = model.Radial1DModel(tardis_config)
41
42 simulation.run_radial1d(radial1d_mdl)
/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/model.pyc in __init__(self, tardis_config)
130 excitation_mode=tardis_config.plasma.excitation,
131 line_interaction_type=tardis_config.plasma.line_interaction_type,
--> 132 link_t_rad_t_electron=0.9)
133
134 self.spectrum = TARDISSpectrum(tardis_config.spectrum.frequency, tardis_config.supernova.distance)
/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/standard_plasmas.pyc in __init__(self, number_densities, atomic_data, time_explosion, t_rad, delta_treatment, nlte_config, ionization_mode, excitation_mode, line_interaction_type, link_t_rad_t_electron)
103 delta_input=delta_treatment, nlte_species=nlte_config.species,
104 previous_electron_densities=initial_electron_densities,
--> 105 previous_beta_sobolevs=initial_beta_sobolevs)
/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/base.pyc in __init__(self, plasma_properties, **kwargs)
20 self._build_graph()
21 # self.write_to_tex('Plasma_Graph', 'Plasma_Formulae')
---> 22 self.update(**kwargs)
23
24 def __getattr__(self, item):
/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/base.pyc in update(self, **kwargs)
141
142 for module_name in self._resolve_update_list(kwargs.keys()):
--> 143 self.plasma_properties_dict[module_name].update()
144
145 def _update_module_type_str(self):
/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/properties/base.pyc in update(self)
84 if len(self.outputs) == 1:
85 setattr(self, self.outputs[0], self.calculate(
---> 86 *self._get_input_values()))
87 else:
88 new_values = self.calculate(*self._get_input_values())
/afs/mpa/home/ulnoe/local/lib/python2.7/site-packages/tardis_sn-1.0.1-py2.7-linux-x86_64.egg/tardis/plasma/properties/radiative_properties.pyc in calculate(lines, j_blues_array, beta_rad)
159 @staticmethod
160 def calculate(lines, j_blues_array, beta_rad):
--> 161 return pd.DataFrame(j_blues_array, index=lines.index)
162
163 class LTEJBlues(ProcessingPlasmaProperty):
/opt/python-2.7.9/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
237 else:
238 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 239 copy=copy)
240 elif isinstance(data, (list, types.GeneratorType)):
241 if isinstance(data, types.GeneratorType):
/opt/python-2.7.9/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_ndarray(self, values, index, columns, dtype, copy)
405 values = _possibly_infer_to_datetimelike(values)
406
--> 407 return create_block_manager_from_blocks([values], [columns, index])
408
409 @property
/opt/python-2.7.9/lib/python2.7/site-packages/pandas/core/internals.pyc in create_block_manager_from_blocks(blocks, axes)
3526 blocks = [getattr(b, 'values', b) for b in blocks]
3527 tot_items = sum(b.shape[0] for b in blocks)
-> 3528 construction_error(tot_items, blocks[0].shape[1:], axes, e)
3529
3530
/opt/python-2.7.9/lib/python2.7/site-packages/pandas/core/internals.pyc in construction_error(tot_items, block_shape, axes, e)
3507 raise e
3508 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3509 passed,implied))
3510
3511
ValueError: Shape of passed values is (0, 0), indices imply (0, 270394)
In [4]:
@unoebauer Strange - it worked fine for me. Let me look and see where I went wrong.
@unoebauer I tried checking it out directly from the PR rather than using my own branch and it still works for me. Is this the test you're talking about?
atom_data: kurucz_cd23_chianti_H_He.h5 model: abundances: filename: abundances_test.dat filetype: simple_ascii type: file structure: filename: densities_test.dat filetype: simple_ascii type: file v_inner_boundary: 12000.000 km/s v_outer_boundary: 35000.000 km/s montecarlo: black_body_sampling: num: 1000000 start: 1 angstrom stop: 1000000 angstrom iterations: 1 last_no_of_packets: 1000 no_of_packets: 1000 no_of_virtual_packets: 3 nthreads: 16 seed: 23111963 plasma: initial_t_rad: 10000 K initial_t_inner: 10000 K disable_electron_scattering: false excitation: dilute-lte ionization: nebular line_interaction_type: downbranch radiative_rates_type: dilute-blackbody spectrum: num: 10000 start: 500 angstrom stop: 20000 angstrom supernova: luminosity_requested: 8.675 log_lsun luminosity_wavelength_end: 6500.000 angstrom luminosity_wavelength_start: 3500.000 angstrom time_explosion: 6.800 day tardis_config_version: v1.0
In case anyone's following this, the problem is apparently just our different pandas versions. I'm going to tweak it a bit so it works with both versions now.
@unoebauer I've fixed it for pandas-0.16.0. Could you please check that it's working for you now?
@aoifeboyle: Looks good now - thanks. The plasma_fail_test.yml
setup runs without problems and the Travis tests pass.
The discussion is getting out of hand. Too many loose and dead ends have accumulated here (and I am as much to blame for this as anybody) and are muddying the picture. I think it's time to take a step back, review the current status and take a fresh, re-focused start. For this purpose, I am closing this issue and will open a new one in the hope of simplifying the problem solving process. Please don't post anything here any more but use the new issue, #464.
Main description:
TARDIS produces different output between 45eca24d77eacc91d0a0b875e5521bc2d3dc151a and master with this YML file. Why? and what commit introduced this change.
The following YML files provide a minimum "working" example": issue_455.zip
The diagnosis of the problem at hand is still work in progress, but it seems the most recent Tardis version, c9e1b1761a6745d44bae973d962e1be180295827, is inconsistent with a previous state of Tardis.
All this has been triggered by re-examining some detailed runs performed during the Abundance Tomography Project of Talytha. For that, we mainly used a Tardis version similar to aa7da173d0e7339bc962bafb62d34d60596fd951 (a working Open-MP implementation has just been incorporated, PR #338). Comparing the virtual spectra obtained with the recent Tardis version for the same setup, reveals huge discrepancies:
The new Tardis version finds a much higher inner temperature. This goes hand in hand with a much higher radiation temperature in the ejecta and a higher ionisation state (and in turn electron density).
Setup:
tardis_00173_2.yml
densities_00173_2.dat
abundances_00173_2.dat