phoebe-project / phoebe2

PHOEBE - Eclipsing Binary Star Modeling Software
http://phoebe-project.org
GNU General Public License v3.0
80 stars 30 forks source link

Nealder-Mead optimiser fails #763

Open amiszuda opened 1 year ago

amiszuda commented 1 year ago

When setting the MPI on, the NM optimiser fails with the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-18-0acdc279fa53> in <module>
      1 phoebe.mpi_on(nprocs=12)
----> 2 b.run_solver('opt_nm_full', solution='LC_nm_full', overwrite=True)

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _send_if_client(self, *args, **kwargs)
    422 
    423         else:
--> 424             return fctn(self, *args, **kwargs)
    425 
    426     return _send_if_client

~/.local/lib/python3.6/site-packages/phoebe/frontend/bundle.py in run_solver(self, solver, solution, detach, return_changes, **kwargs)
  13635 
  13636             if not detach:
> 13637                 return job_param.attach(sleep=job_sleep)
  13638             else:
  13639                 logger.info("detaching from run_solver.  Call get_parameter(solution='{}').attach() to re-attach".format(solution))

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in attach(self, wait, sleep, cleanup, return_changes)
  12582         else:
  12583             logger.info("current status: {}, pulling job results".format(status))
> 12584             return self._retrieve_and_attach_results(cleanup=cleanup, return_changes=return_changes)
  12585 
  12586     def load_progress(self, cleanup=True, return_changes=False):

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _retrieve_and_attach_results(self, cleanup, return_changes)
  12444             _ = self.get_status()
  12445 
> 12446         ret_ps = self._retrieve_results()
  12447 
  12448         if not len(ret_ps.to_list()) and 'progress' in self._value:

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _retrieve_results(self)
  12414                         raise ValueError("job has not yet produced any output, with the following log:\n\n{}".format("\n".join(e.readlines())))
  12415                     else:
> 12416                         raise ValueError("job failed with the following log:\n\n{}".format("\n".join(e.readlines())))
  12417 
  12418             else:

ValueError: job failed with the following log:

After ValueError: job failed with the following log: no other message appears.

When setting MPI off, the solver runs. This error is not always repetitive.

The working example is provided below:

import phoebe
import numpy as np
import matplotlib.pyplot as plt

b = phoebe.default_binary(semidetached='secondary')
lc = np.loadtxt('lc_corrected_for_occonell.txt',unpack=True)

b.add_dataset('lc', times = lc[0], fluxes=lc[1], sigmas=lc[2], 
              compute_phases=phoebe.linspace(0,1,101), passband='TESS:T')

b.add_solver('estimator.lc_periodogram', solver='lcperiod_bls', 
             algorithm='bls', minimum_n_cycles=2, sample_mode='manual',
             sample_periods = np.linspace(3.,4.,1000),
             overwrite=True)
b.run_solver('lcperiod_bls', solution='lcperiod_bls_sol', overwrite=True)
print(b['fitted_values@lcperiod_bls_sol'])
b.adopt_solution('lcperiod_bls_sol')

b.set_value('teff@primary', value=25000)
b.set_value('atm@primary', 'blackbody')
b.set_value('atm@secondary', 'blackbody')
b.set_value('ld_mode_bol@primary', 'manual')
b.set_value('ld_mode_bol@secondary', 'manual')

b.add_solver('estimator.lc_geometry', solver='lc_est_lcgeom', phase_bin = False)
b.run_solver('lc_est_lcgeom', solution='lc_soln_lcgeom')
b.flip_constraint('ecc', solve_for='esinw')
b.flip_constraint('per0', solve_for='ecosw')
b.run_compute(model='lc_geom', sample_from='lc_soln_lcgeom', overwrite=True)
b.adopt_solution('lc_soln_lcgeom', overwrite=True)

b.add_compute('ellc', compute='fastcompute')
b.flip_constraint('esinw', solve_for='ecc')
b.flip_constraint('ecosw', solve_for='per0')

b.add_solver('optimizer.nelder_mead', 
             solver='opt_nm_full',
             fit_parameters = ['t0_supconj@binary', 'period@binary','incl@binary', 
                               'teffratio', 'requivsumfrac', 'esinw', 'ecosw', 'q', 
                               'sma@binary', 'vgamma@system'],
             compute='fastcompute', overwrite=True)
b.set_value('maxiter@opt_nm_full', solver='opt_nm_full', value=10000)
b.set_value('expose_lnprobabilities@opt_nm_full', True)
b.set_value('progress_every_niters@opt_nm_full', 1)

phoebe.mpi_on(nprocs=12)
b.run_solver('opt_nm_full', solution='NM_sol', overwrite=True)

The situation is no different when MPI is turned off, or when the solver runs on compute='phoebe01'.

lc_corrected_for_occonell.txt

amiszuda commented 1 year ago

@kecnry, sorry to mention you but somehow my previous post got missing, so I am not sure whether posting another one under the same issue number will make the issue pop out in the notifications.

kecnry commented 1 year ago

I had seen it come through, noticed it was blank, and just assumed you cleared it instead of closing it. Let me see if anyone can reproduce this on their own machines so we can track down the error (since the error log is otherwise empty).

kecnry commented 1 year ago

@amiszuda - what version of phoebe are you running?

amiszuda commented 1 year ago

2.4.10 on Ubuntu

bpablo commented 1 year ago

Hello,

I have tried to reproduce this error, but unfortunately the script is erroring out before I ever get to your issue with the following: ValueError: 0 results found for twig: 'ecc@binary', {'context': 'constraint', 'check_visible': True, 'check_default': True, 'check_advanced': False, 'check_single': False}

Can you confirm that this script does in fact work for you and you aren't getting this error?

amiszuda commented 1 year ago

Hi Bert!

I am not getting that error, though I am getting others regarding constraints or ld_mode='interp' not being supported by blackbody. This is also strange since I did not get those using the same commands under the notebook, however, I did notice those differences a couple of times before. As this issue was reported some time ago already and since I did not really provide a full report using the exact bit-to-bit code that crashed on my end (apologies here!) I will try to dig a bit and attach a new script as python executable script and notebook sheet hoping it will provide a better log of what is happening here. I'll do my best to get back to you asap.

amiszuda commented 1 year ago

Hi again,

Below, I provide a fully reproducible script and a notebook sheet. This is the exact version of the code that causes the following crash:

# crimpl: ls -d /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-*
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: mkdir -p /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs
# crimpl: cp exportpath.sh /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/
# crimpl: source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; conda -V
# crimpl: source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; mkdir -p /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12
# crimpl: cp crimpl_submit_script.sh /media/data/Work/BCep/TIC0247315421/phoebe/_cPywECbIMpNKMWIELAdcSpclMhYJei.py /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/
# crimpl: source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; echo '_cPywECbIMpNKMWIELAdcSpclMhYJei.py' >> /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-input-files.list
# crimpl: source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; echo 'False' > /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-conda-environment
# crimpl (detached): source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; cd /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12; chmod +x ./crimpl_submit_script.sh; nohup bash ./crimpl_submit_script.sh 2> ./crimpl_submit_script.sh.err & echo $! > crimpl-nohup.pid
# crimpl: ls -d /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-*
# crimpl: cat /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-job.status
# crimpl: cat /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-job.status
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: cat /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-input-files.list
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: cp /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/nohup.out ./
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-3a686994c1ae> in <module>
      1 phoebe.mpi_on(nprocs=12)
      2 # phoebe.mpi_off()
----> 3 b.run_solver('opt_nm_full', solution='NM', overwrite=True)

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _send_if_client(self, *args, **kwargs)
    422 
    423         else:
--> 424             return fctn(self, *args, **kwargs)
    425 
    426     return _send_if_client

~/.local/lib/python3.6/site-packages/phoebe/frontend/bundle.py in run_solver(self, solver, solution, detach, return_changes, **kwargs)
  13635 
  13636             if not detach:
> 13637                 return job_param.attach(sleep=job_sleep)
  13638             else:
  13639                 logger.info("detaching from run_solver.  Call get_parameter(solution='{}').attach() to re-attach".format(solution))

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in attach(self, wait, sleep, cleanup, return_changes)
  12582         else:
  12583             logger.info("current status: {}, pulling job results".format(status))
> 12584             return self._retrieve_and_attach_results(cleanup=cleanup, return_changes=return_changes)
  12585 
  12586     def load_progress(self, cleanup=True, return_changes=False):

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _retrieve_and_attach_results(self, cleanup, return_changes)
  12444             _ = self.get_status()
  12445 
> 12446         ret_ps = self._retrieve_results()
  12447 
  12448         if not len(ret_ps.to_list()) and 'progress' in self._value:

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _retrieve_results(self)
  12414                         raise ValueError("job has not yet produced any output, with the following log:\n\n{}".format("\n".join(e.readlines())))
  12415                     else:
> 12416                         raise ValueError("job failed with the following log:\n\n{}".format("\n".join(e.readlines())))
  12417 
  12418             else:

ValueError: job failed with the following log:

phoebe_failing_job.tar.gz

bpablo commented 12 months ago

Hey Amadeusz,

I still can't produce what you do. With the MPI on it fails but does give me an error as i don't think this computer is set up for it. However, if I don't use mpi it appears to be working:

--------------------------------
  0%|                                                     | 2/10000 [00:11<15:33:51,  5.60s/it]/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/astropy/units/quantity.py:666: RuntimeWarning: invalid value encountered in subtract
  result = super().__array_ufunc__(function, method, *arrays, **kwargs)
  0%|                                                    | 22/10000 [02:30<18:54:41,  6.82s/it]/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/scipy/optimize/_optimize.py:917: RuntimeWarning: invalid value encountered in subtract
  np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):
  1%|▎                                                    | 64/10000 [05:38<8:34:04,  3.10s/it]
---------------------------------

It isn't finished yet so maybe it will fail eventually, but at this point it appears to be fine. Can you confirm whether you see any of this or not?

amiszuda commented 12 months ago

Hey Bert,

No, nothing like it. It's weird, as sometimes the mpi is working and sometimes not. One more thing though, as I observed only recently, the empty ValueError: job failed with the following log: appears only on jupyter.

bpablo commented 11 months ago

It ran for me and finished in jupyter. I am using lab though and not notebook. Is it the same for you?

amiszuda commented 11 months ago

No, I am using jupyter-notebook