BEAGLE v0.11.x jobs stalling

alex-delavega commented 7 years ago

Hi Jacopo and Emma,

I pulled the latest version of BEAGLE (v0.11.3) and launched a number of jobs on my laptop to do integrated fits of some 3D-HST galaxies. It seems that every time I launch a job using BEAGLE V0.11.3, my job stalls and doesn't progress to the next galaxy in the FITS file. Occasionally a fit will finish, but most of the time the job hangs and I need to terminate the job after about 90 minutes of little progress.

I was able to run fit_photometry_example.param with no problem using the latest version, but for some reason all my own jobs with the latest version of BEAGLE seem to hang and never get past the first galaxy in the photometry catalogue.

I'm not sure why the stalling occurs with my galaxies and not with the example photometry file. Attached are my parameter files, data and terminal output. Any input would be immensely appreciated!

gds_integshort1.zip

alex-delavega commented 7 years ago

@camipacifici and I played around with the parameter file above and found some interesting results.

First, we reran the job using the same parameter file as for the example case but with my own data - no stalling.

Then we systematically changed one prior at a time in the example param file using the priors from my .param file above to find out which components might be responsible for the stalling.

We found that high masses (8 < log(M) < 12) in the prior and the gaussian SFR prior used above can lead to stalled jobs, but widening the mass range to (5 < log(M) < 12) could overcome this.

Likewise, an exponential dust prior with the mass range above doesn't produce any issues, but adding a uniform SFR prior like that found in the example to an exponential dust prior and mass prior leads to stalled jobs.

Do you have some suggestions on how I should set up my .param file for future jobs with version 0.11.x?

jacopo-chevallard commented 7 years ago

Hi Alex, I'll take a look asap! which Beagle version were you using when experiencing no stalling with the parameter file that is producing stalling in Beagle 0.11.3? (knowing this version would allow me to narrow down the possible origin of the issue)

alex-delavega commented 7 years ago

Hi Jacopo,

Thank you so much! I believe I was using v0.7.10 when I started using these priors in March.

alex-delavega commented 7 years ago

Hi Jacopo and Emma,

I retried the experiment above but on the photometry example provided. Interestingly, I was able to fit the photometry example using an exponential dust prior and either a uniform or gaussian SFR prior.

Subsequently, I decided to fit my data using the same fitting scheme as found in fit_photometry_example.param, only changing the data file. Apparently, reading in the redshift caused a SIGSEV error, while letting BEAGLE fit for the redshift allowed the fit to continue, though it began to stall and I stopped the fit after 20 minutes.

Enclosed are the fits - those beginning with fitting_example include fit_photometry_example.param with a different SFR prior (uniform or gaussian) and those beginning with gds_integ are the fits for my data (which include terminal output on maximum verbosity). The fit wherein BEAGLE fits the redshift ends with fit_z.

I should mention I pulled BEAGLE v0.14.5 before conducting these fits - I hadn't noticed this particular SIGSEV error before when fitting my data.

Any input would be greatly appreciated! issue_50_stuff.zip

jacopo-chevallard commented 7 years ago

Hi Alex, sorry for the long time in getting back to you! The problem with the redshift set from_file is that you didn't pass any MOCK INPUT PARAMETERS keyword, and this is why the SIGSEV error. I've now added an error check, so that SIGSEV should be avoided, and a more meaningful error message thrown. I'll now move to the second problem.

jacopo-chevallard commented 7 years ago

About the stalling, I think that the problem is in the S/N of the data, which is extremely large in some bands: 30 to 80 in the B band, 50 to 460 in the V band, 200 to 1000 in the z band, 90 to 800 in the i1 band, 100 to 400 in the j2 band. Unfortunately, the models (and the numerical algorithms) are far from being accurate at the sub-percent level, this is why is (in general) a good idea to use the min_rel_err token in the filter file.

If I modify the filter file as below the fits runs without stalling

# It contains the filters to be used when calculating only photmetric models
redshift:colName:z
#uncomment the above if you want to fit with a fixed redshift, also change param file to make redshift parameter 'fixed', rather than 'fitted'
units:Jy
object_ID:colName:ID

index:208       min_rel_err:0.02  flux:colName:flux_B   fluxerr:colName:fluxerr_B      label:WFC_F435W
index:209       min_rel_err:0.02  flux:colName:flux_V   fluxerr:colName:fluxerr_V      label:WFC_F606W
index:210       min_rel_err:0.02  flux:colName:flux_i1   fluxerr:colName:fluxerr_i1      label:WFC_F775W
index:211       min_rel_err:0.02  flux:colName:flux_i2   fluxerr:colName:fluxerr_i2      label:WFC_F814W
index:212       min_rel_err:0.02  flux:colName:flux_z   fluxerr:colName:fluxerr_z      label:WFC_F850W
index:215       min_rel_err:0.02  flux:colName:flux_j1  fluxerr:colName:fluxerr_j1     label:WFC3_F125W
index:216       min_rel_err:0.02  flux:colName:flux_j2  fluxerr:colName:fluxerr_j2     label:WFC3_F140W
index:217       min_rel_err:0.02  flux:colName:flux_H   fluxerr:colName:fluxerr_H      label:WFC3_F160W

alex-delavega commented 7 years ago

Hi Jacopo,

Thank you so much! I'll use min_rel_err from now on.

What should I do to address the redshift issue? I was just able to run the fits without stalling by removing the redshift parameter from the PARAMETERS HANDLING section of the .param file.

jacopo-chevallard commented 7 years ago

If your filter file already contains the line (uncommented)

redshift:colName:z

(as in the example above), then you don't need to add the redshift among the parameters, since it will be set from the input photometric catalogue.

Alternatively, you can use the MOCK INPUT PARAMETERS keyword, e.g.

MOCK INPUT PARAMETERS = fileName:my_file_name.fits

along with the from_file parameter, e.g.

PARAMETER  = name:redshift     type:from_file

where the my_file_name.fits file (which can be the same file containing the photometric catalogue) must contains a column named redshift. I think that the first option is more practical in this case, the second option is useful if you want to read parameters other than redshift from an input file, or if you want to adopt a Gaussian prior where mean and sigma are different for every object (and therefore they must be read from a file).

alex-delavega commented 7 years ago

Great! Thank you so much, Jacopo!

jacopo-chevallard / BEAGLE-general

BEAGLE v0.11.x jobs stalling #50