abnormal SED - Githubissues

drawnacross commented 1 week ago

Hi,

We encountered an issue during the CIGALE SED fitting process.

On September 29, we updated the model parameters (analysis_params: core=54, block=100, mock_flag=True). However, this adjustment resulted in a bug. Prior to this, the SED fitting process using CIGALE was functioning normally. The fitted SEDs showed significant discrepancies from the photometric data (refer to Fig.1 and Fig.2), and the reduced chi^2 appeared to be within normal ranges. We suspect that certain errors may have caused CIGALE to match the SED with incorrect reduced chi^2 values. Fig1 Fig2

Throughout the process, we noticed warnings (e.g., Fig.3 and Fig.4, "RuntimeWarning: divide by zero encountered in log chi2 -= 2. np.log(.5 "). However, after attempting to debug utils.py, we are unsure of the cause of the issue. Fig3_20240929_cigale_block1_worning Fig4_20240929_cigale_mockblock1_worning

On October 3, we re-ran a few sources (5 in total) using the same configuration file (core=54, block=100, mock_flag=False). The results were normal, with all 5 sources fitting correctly.

Subsequently, on October 5, we decided to run the process again (core=96, block=50, mock_flag=True). However, due to insufficient server memory, the program terminated automatically (Fig.5, MemoryError), no FITS files were generated. Fig5_20241005_memoryerror

Therefore, on October 7, we speculated that the error may be due to insufficient memory. We increased the block size (core=96, block=200, mock_flag=True), with the memory usage reaching approximately ~300G/1000G. Despite this adjustment, the results remained anomalous. Most of the SEDs exhibited anomalies in both of these runs of September 29 and October 7. However, we also observed that a few sources showed anomalous SEDs in the September 29 results but appeared normal in the October 7 results, and vice versa.

We don't know why we're suddenly facing this problem. Are we missing something? Furthermore, it is puzzling that the results from testing a few sources (results from 20241003) individually were normal, while the results for the entire source catalog were abnormal. Have you encountered similar situations before? How would you recommend resolving this issue?

I have attached the CIGALE SED plots for the same 5 sources from the three test runs (20240929, 20241003, 20241007) along with the configuration files for each run in the directory. The *best_model.pdf results from 20240929 were not plotted in time, so I added my own plots, which included a portion of our sample (20241007 also included a similar plots). issue_cigale.tar.gz

Our server is running Python version 3.8.8 with the following dependencies:

CIGALE 2022.1
astropy 4.2.1
configobj 5.0.8
matplotlib 3.3.4
numpy 1.22.0
rich 12.6.0
scipy 1.6.2

Thank you for your attention to this matter. Your insights and guidance would be greatly appreciated.

mboquien commented 1 week ago

I have encountered such issue in the past. I believe this is due to a mixup between different models when using a custom grid of redshifts and blocks. It may already be fixed in more recent version (though the bug was a little elusive, so I cannot guarantee). If at all possible for you, could you try to run the latest development version of CIGALE (you can get it from the git develop branch or simply download a snapshot of it). At the same time, I would strongly recommend to update the python version. It might still work with 3.8, but I cannot guarantee it. At the moment I am running CIGALE with python 3.11 and 3.12 without any issue.

Let me know if the current development version helps. Note that you may have to regenerate the configuration file from scratch as the format may have changed a little.

drawnacross commented 1 week ago

I have encountered such issue in the past. I believe this is due to a mixup between different models when using a custom grid of redshifts and blocks. It may already be fixed in more recent version (though the bug was a little elusive, so I cannot guarantee). If at all possible for you, could you try to run the latest development version of CIGALE (you can get it from the git develop branch or simply download a snapshot of it). At the same time, I would strongly recommend to update the python version. It might still work with 3.8, but I cannot guarantee it. At the moment I am running CIGALE with python 3.11 and 3.12 without any issue.

Let me know if the current development version helps. Note that you may have to regenerate the configuration file from scratch as the format may have changed a little.

Thank you for your prompt response.

We used the CIGALE-develop version and Python 3.11, but still encountered erroneous results. Since we used the same configuration files, the abnormal SEDs and the anomalous results from 20241007 are the same (compare the two plots above). SSA22.0003_best_model.pdf SSA22.0024_best_model.pdf

Some sources in our data have redshift information, while others do not (using photometric redshift mode, redshift=-1). Do we need to run them separately?

We set the flux and error values of sources without photometric information to -9999. Should we set them to NaN instead?

mboquien commented 1 week ago

I tend to prefer setting to NaN but -9999 should work fine. You can check what CIGALE actually fits after preprocessing the input file looking at the fluxes in the observations.txt file in the out folder.

About the redshift, I would indeed do two separate run, it should give better results for the galaxies for which you do have the redshift.

Regarding the initial bug, would you mind sending me the fluxes of a single problematic object along with the latest version of the picgale.ini and picgale.ini.spec files and any filter that is not shipped by default with CIGALE? I will try to look into it. Thanks!

drawnacross commented 1 week ago

Thank you very much for your help.

The document contains the latest configuration files. In the source catalog, there are four problematic sources, two with redshift and two without. issue_invest_example.tar.gz

In our testing on 20241003, it seems that the runs for a few sources do not exhibit anomalies.

mboquien commented 1 week ago

Thank you for the archive. I have a few tight deadlines in the next few days but I will do my best to investigate the issue shortly.

mboquien commented 6 days ago

This is a little baffling. I have fitted the two objects with known z, removing the redshifts from the list of redshifts in pcigale.ini , which means the models are created at the input redshift of the objects. I got reasonably good results. Can you tell me if in this case you get correct results or not? If so it may be that computing by blocks with a redshift grid sometimes creates issues. However, in that configuration I cannot reproduce the issue either. I have not done the full redshift grid (keeping the explicit list of redshifts in pcigale.ini) as it would take too long, but on a reduced grid I got very acceptable fits.

You initially mentioned that your best-fit plots show a low χ² despite visually bad fits This makes me think that the fit itself is fine as well as the corresponding physical parameters and it may just be that the final generated spectrum is erroneous for some reason. Naturally, this should not happen. Would you mind sending me the full content of the out folder? I would like to do some sanity checks to exclude some problems.

One last thing, by any chance, have you reproduced this problem with a smaller grid? The smaller the better for debugging, it would be super helpful.

drawnacross commented 5 days ago

Sorry for the late reply.

Yes. We included five sources in our test on 20241003, all of which have known redshift, so we did not use a redshift grid, and the results obtained were very normal.
Previously, we attempted to fix the configuration file to the output parameters in out/results.txt for certain sources. However, after running the results, we received a prompt: No valid SED was created because chi2 is too large. So the SED cannot be plotted.

Do you need the entire folder? Because it contains the source catalog for all sources, perhaps I can send it to you privately. Is your email address mederic.boquien@uantof.cl still valid?

Thank you for your suggestion, we can try that. We did not encounter a similar issue in our runs before 20240929. This is the configuration file we used on 20240921, and there were no issues when running this configuration file at that time. However, because the parameters at that time were not able to fit the sources well, we later updated the parameters. I am not sure what new parameters were set that caused the issue to occur. pcigale.ini.tar.gz

mboquien commented 5 days ago

Just a quick idea. Can you change ctypes.c_uint32 to ctypes.c_uint64 on that line and that line and let me know if that fixes the issue ?

If that still does not work, you can always send me the files at mederic.boquien@oca.eu. Thanks!

drawnacross commented 5 days ago

Thanks, we will try.

drawnacross commented 3 days ago

After changing code to ctypes.c_uint64, the running results seem to be normal. I would like to know why this code caused the previous issue.

Additionally, I have two small questions:

For the physical parameters in out/results.txt (such as best.sfh.sfr), how should we provide their errors?
In our configuration file, there are a total of 9.5 billion templates (0.22 billion parameter grids * 44 redshift grids). This means that all sources are fitted using these 9.5 billion templates. However, our source catalog contains some sources with spectroscopic redshift. How are these 9.5 billion templates (presumably combining 44 fixed redshift grids) used to fit sources with redshift information (spectroscopic redshift may not necessarily fall on the 44 redshift grids)?

mboquien commented 3 days ago

The root cause of the bug is that these variables store the index of the best-fit model. Unsigned integers on 32 bits can store number from 0 to 4294967295. I did not expect CIGALE would be used with more than this number of models back then (I myself never went over ~1.5 billion models). When for your run the index of the best-fit model is larger than 4294967295, then it loops over, starting again from 0. For instance model index 7653127981 will be mistakenly stored as 3358160685 (you can check with np.array(somevalue).astype(np.uint32)). This means that when computing the best-fit spectrum at the end of the run (the spectra are not kept after computing the grid of models as they would consume a huge amount of memory), the wrong model was computed, hence the bad fits.

Regarding your two small questions:

Formally there is no error associated with the best. values. However, you can use the Bayesian estimates (columns starting with bayes.), which provide the estimates of the parameters and the corresponding uncertainties (columns ending in _err). These are based on the likelihood-weighted means and standard-deviations. You can list the properties you would like to estimate in such a way in the variables entry of pdf_analysis. If you want to estimate all of them (though that will slow down the analysis), do not give anything.
I would suggest to split the analysis in two runs. First, a run with the objects for which you have a spectroscopic redshfit only. CIGALE will build a grid using the unique redshifts rounded to two decimals. This means it will only compute the models at the appropriate redshifts. At the end of the run, it will just correct the results to the exact redshift in the input file. Two decimals is more than enough for the model grid unless you have narrow bands. For the galaxies without redshift, you can leave your current configuration as-is. However beware that the redshift sampling weighs the prior, so I would be cautious to eliminate redshifts for which there is a strong expectation that the galaxies are not at these redshifts.

Thank you for your report. I will fix the issue shortly in the develop branch.

drawnacross commented 2 days ago

@mboquien Thank you so much for your quick responses and enthusiastic assistance every time. I truly appreciate it!

mboquien / cigale

abnormal SED #107