kevin218 / Eureka

Eureka! is a data reduction and analysis pipeline intended for time-series observations with JWST.
https://eurekadocs.readthedocs.io/
MIT License
61 stars 48 forks source link

[Bug]: Stage 5 - emcee claims no space on device #670

Closed Witchblade101 closed 3 months ago

Witchblade101 commented 3 months ago

FAQ check

Instrument

Light curve fitting (Stages 4-6)

What happened?

During stage 5 light curve fitting I suddenly start getting warnings about "No space left on device". Output files keep being written, but eventually Eureka! crashes. A df when the warnings started appearing and after the crash show 5 GB still available. Rebooting and trying again with nothing running but iterm and Eureka! produced the same results.

(base) dlong@hikaru ~ % df -h
Filesystem        Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/disk3s8s1   926Gi   9.6Gi   499Gi     2%    404k  4.3G    0%   /
devfs            204Ki   204Ki     0Bi   100%     704     0  100%   /dev
/dev/disk3s6     926Gi    20Ki   499Gi     1%       0  5.2G    0%   /System/Volumes/VM
/dev/disk3s2     926Gi   5.7Gi   499Gi     2%    1.1k  5.2G    0%   /System/Volumes/Preboot
/dev/disk3s4     926Gi    33Mi   499Gi     1%      52  5.2G    0%   /System/Volumes/Update
/dev/disk1s2     500Mi   6.0Mi   480Mi     2%       1  4.9M    0%   /System/Volumes/xarts
/dev/disk1s1     500Mi   6.2Mi   480Mi     2%      38  4.9M    0%   /System/Volumes/iSCPreboot
/dev/disk1s3     500Mi   2.6Mi   480Mi     1%      65  4.9M    0%   /System/Volumes/Hardware
/dev/disk3s7     926Gi   409Gi   499Gi    46%    2.8M  5.2G    0%   /System/Volumes/Data
map auto_user      0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/user
map auto_grp       0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/grp
map auto_eng       0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/eng
map auto_astro     0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/astro
map auto_smov      0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/smov
map auto_itar      0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/itar
map auto_opo       0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/opo
map auto_ifs       0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/ifs
map -hosts         0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/net
map -fstab         0Bi     0Bi     0Bi   100%       0     0     -   /System/Volumes/Data/Network/Servers
/dev/disk5s1     8.3Gi   7.5Gi   785Mi    91%    498k  8.0M    6%   /Library/Developer/CoreSimulator/Volumes/iOS_21A328

Error traceback output

Starting Channel 350 of 404

Using the following limb-darkening values:
u1, 0.01491
u2, 0.33573
=========================
Starting lsq fit.
Starting lnprob: 26136.456589553723

Verbose lsq results:  message: Optimization terminated successfully.
 success: True
  status: 0
     fun: -26155.560047693227
       x: [ 7.094e-02  1.001e+00 -7.527e-03  1.673e+00]
     nit: 3
   direc: [[-1.771e-06 -3.173e-08 -5.010e-06  1.002e-04]
           [ 0.000e+00  1.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  1.000e+00  0.000e+00]
           [-1.150e-04 -3.369e-06  3.307e-06  7.158e-06]]
    nfev: 242

Ending lnprob: 26155.560047693227
Reduced Chi-squared: 1.0004234099706157

LSQ RESULTS:
rp: 0.07094120771021362
c0: 1.0007051216925538
c1: -0.007527434806878129
scatter_mult: 1.6731104061258302; 18254.437606737927 ppm

Completed lsq fit.
-------------------------
Starting emcee fit.

Calling lsqfitter first...
Starting lnprob: 26155.560047693227

Verbose lsq results:  message: Optimization terminated successfully.
 success: True
  status: 0
     fun: -26155.560048462972
       x: [ 7.094e-02  1.001e+00 -7.528e-03  1.673e+00]
     nit: 1
   direc: [[ 1.000e+00  0.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  1.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  1.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  0.000e+00  1.000e+00]]
    nfev: 86

Ending lnprob: 26155.560048462972
Reduced Chi-squared: 1.00044068559413

LSQ RESULTS:
rp: 0.07094113435609314
c0: 1.0007051203847181
c1: -0.007528052510373406
scatter_mult: 1.6730959527324523; 18254.27991327869 ppm

No covariance matrix from LSQ - falling back on a step size based on the prior range
Starting lnprob: 26147.46478885448
Running emcee burn-in...
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
CRDS - WARNING -  Failed creating CRDS cache lock: [Errno 28] No space left on device
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [01:25<00:00, 11.64it/s]
Finished writing to /Users/dlong/DataAnalysis/JWST/eureka/t1e/Obs1/Stage5/S5_2024-07-05_t1e_1_run1/ap6_bg7/S5_emcee_samples_ch350.h5
Ending lnprob: 26155.557328363113
Mean acceptance fraction: 0.589
WARNING: Unable to estimate the autocorrelation time!
Reduced Chi-squared: 0.9994609874144257

EMCEE RESULTS:
rp: 0.07085879760611365 (+0.003143424333129863, -0.003207106231522261)
c0: 1.000701829058928 (+0.00019900400159378329, -0.0001991177648885678)
c1: -0.007571432401672989 (+0.003521721988803187, -0.003461048022101245)
scatter_mult: 1.673546576173572 (+0.011520066567790366, -0.01181628463152995); 18263.21927057678 (+125.7171474846215, -128.94974078498308) ppm

Completed emcee fit.
-------------------------
=========================
Saving results

Starting Channel 351 of 404

Using the following limb-darkening values:
u1, 0.01317
u2, 0.33619
=========================
Starting lsq fit.
Starting lnprob: 26177.26291360858

Verbose lsq results:  message: Optimization terminated successfully.
 success: True
  status: 0
     fun: -26206.535065420358
       x: [ 7.242e-02  1.001e+00 -5.765e-03  1.585e+00]
     nit: 3
   direc: [[ 0.000e+00  0.000e+00  0.000e+00  1.000e+00]
           [ 0.000e+00  1.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  1.000e+00  0.000e+00]
           [ 9.276e-05  2.752e-06 -6.356e-06 -4.232e-07]]
    nfev: 212

Ending lnprob: 26206.535065420358
Reduced Chi-squared: 1.0004246838602204

LSQ RESULTS:
rp: 0.07242152397961654
c0: 1.000747613527901
c1: -0.005765288173828141
scatter_mult: 1.5846895855066212; 18164.215055027624 ppm

Completed lsq fit.
-------------------------
Starting emcee fit.

Calling lsqfitter first...
Starting lnprob: 26206.535065420358

Verbose lsq results:  message: Optimization terminated successfully.
 success: True
  status: 0
     fun: -26206.535065439504
       x: [ 7.242e-02  1.001e+00 -5.765e-03  1.585e+00]
     nit: 1
   direc: [[ 1.000e+00  0.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  1.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  1.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  0.000e+00  1.000e+00]]
    nfev: 110

Ending lnprob: 26206.535065439504
Reduced Chi-squared: 1.0004244419179735

LSQ RESULTS:
rp: 0.0724215507495267
c0: 1.0007476148752652
c1: -0.005764611080186319
scatter_mult: 1.5846897832309546; 18164.217321406693 ppm

No covariance matrix from LSQ - falling back on a step size based on the prior range
Starting lnprob: 26203.652051605644
Traceback (most recent call last):
  File "/Users/dlong/DataAnalysis/JWST/eureka/t1e/Obs1/run_eureka_1.py", line 28, in <module>
    s5_meta = s5.fitlc(eventlabel, ecf_path=ecf_path)
  File "/Users/dlong/Eureka/src/eureka/S5_lightcurve_fitting/s5_fit.py", line 482, in fitlc
    meta, params = fit_channel(meta, time_temp, flux, channel,
  File "/Users/dlong/Eureka/src/eureka/S5_lightcurve_fitting/s5_fit.py", line 990, in fit_channel
    lc_model.fit(model, meta, log, fitter='emcee')
  File "/Users/dlong/Eureka/src/eureka/S5_lightcurve_fitting/lightcurve.py", line 173, in fit
    fit_model = self.fitter_func(self, model, meta, log, **kwargs)
  File "/Users/dlong/Eureka/src/eureka/S5_lightcurve_fitting/fitters.py", line 330, in emceefitter
    pool = Pool(meta.ncpu)
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/pool.py", line 196, in __init__
    self._change_notifier = self._ctx.SimpleQueue()
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/context.py", line 113, in SimpleQueue
    return SimpleQueue(ctx=self.get_context())
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/queues.py", line 341, in __init__
    self._rlock = ctx.Lock()
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(
OSError: [Errno 28] No space left on device
(eureka) dlong@hikaru Obs1 % /Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 9980 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

What operating system are you using?

MacOS Sonoma 14.5

What version of Python are you running?

Python 3.10.14

What Python packages do you have installed?

# packages in environment at /Users/dlong/miniconda3/envs/eureka:
#
# Name                    Version                   Build  Channel
alabaster                 0.7.16                   pypi_0    pypi
asciitree                 0.3.3                    pypi_0    pypi
asdf                      3.2.0                    pypi_0    pypi
asdf-astropy              0.6.1                    pypi_0    pypi
asdf-coordinates-schemas  0.3.0                    pypi_0    pypi
asdf-standard             1.1.1                    pypi_0    pypi
asdf-transform-schemas    0.5.0                    pypi_0    pypi
asdf-wcs-schemas          0.4.0                    pypi_0    pypi
asteval                   1.0.0                    pypi_0    pypi
astraeus                  0.3                      pypi_0    pypi
astropy                   6.1.1                    pypi_0    pypi
astropy-healpix           1.0.3                    pypi_0    pypi
astropy-iers-data         0.2024.7.1.0.34.3          pypi_0    pypi
astroquery                0.4.7                    pypi_0    pypi
astroscrappy              1.2.0                    pypi_0    pypi
asttokens                 2.4.1                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
babel                     2.15.0                   pypi_0    pypi
backports-tarfile         1.2.0                    pypi_0    pypi
batman-package            2.4.9                    pypi_0    pypi
bayesicfitting            3.2.1                    pypi_0    pypi
beautifulsoup4            4.12.3                   pypi_0    pypi
bokeh                     2.4.3                    pypi_0    pypi
bottleneck                1.4.0                    pypi_0    pypi
bzip2                     1.0.8                h93a5062_5    conda-forge
ca-certificates           2024.6.2             hf0a4a13_0    conda-forge
ccdproc                   2.4.2                    pypi_0    pypi
celerite2                 0.3.2                    pypi_0    pypi
certifi                   2024.6.2                 pypi_0    pypi
cftime                    1.6.4                    pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
contourpy                 1.2.1                    pypi_0    pypi
corner                    2.2.2                    pypi_0    pypi
crds                      11.17.25                 pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
dask                      2024.6.2                 pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
docutils                  0.21.2                   pypi_0    pypi
drizzle                   1.15.2                   pypi_0    pypi
dynesty                   2.1.4                    pypi_0    pypi
emcee                     3.1.6                    pypi_0    pypi
eureka                    0.11.dev327+ga9392361          pypi_0    pypi
exceptiongroup            1.2.1                    pypi_0    pypi
executing                 2.0.1                    pypi_0    pypi
exotic-ld                 3.0.0                    pypi_0    pypi
fasteners                 0.19                     pypi_0    pypi
filelock                  3.15.4                   pypi_0    pypi
fonttools                 4.53.0                   pypi_0    pypi
fsspec                    2024.6.1                 pypi_0    pypi
future                    1.0.0                    pypi_0    pypi
george                    0.4.2                    pypi_0    pypi
gwcs                      0.21.0                   pypi_0    pypi
h5netcdf                  1.3.0                    pypi_0    pypi
h5py                      3.11.0                   pypi_0    pypi
html5lib                  1.1                      pypi_0    pypi
idna                      3.7                      pypi_0    pypi
imageio                   2.34.2                   pypi_0    pypi
imagesize                 1.4.1                    pypi_0    pypi
importlib-metadata        8.0.0                    pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
ipython                   8.26.0                   pypi_0    pypi
jaraco-classes            3.4.0                    pypi_0    pypi
jaraco-context            5.3.0                    pypi_0    pypi
jaraco-functools          4.0.1                    pypi_0    pypi
jedi                      0.19.1                   pypi_0    pypi
jinja2                    3.1.4                    pypi_0    pypi
jmespath                  1.0.1                    pypi_0    pypi
jsonschema                4.22.0                   pypi_0    pypi
jsonschema-specifications 2023.12.1                pypi_0    pypi
jwst                      1.14.0                   pypi_0    pypi
keyring                   25.2.1                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lazy-loader               0.4                      pypi_0    pypi
libffi                    3.4.2                h3422bc3_5    conda-forge
libsqlite                 3.46.0               hfb93653_0    conda-forge
libzlib                   1.3.1                hfb2fe0b_1    conda-forge
lmfit                     1.3.1                    pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib                3.9.0                    pypi_0    pypi
matplotlib-inline         0.1.7                    pypi_0    pypi
mc3                       3.2.0                    pypi_0    pypi
more-itertools            10.3.0                   pypi_0    pypi
ncurses                   6.5                  hb89a1cb_0    conda-forge
netcdf4                   1.7.1.post1              pypi_0    pypi
networkx                  3.3                      pypi_0    pypi
numcodecs                 0.12.1                   pypi_0    pypi
numpy                     1.24.4                   pypi_0    pypi
numpydoc                  1.7.0                    pypi_0    pypi
opencv-python-headless    4.10.0.84                pypi_0    pypi
openssl                   3.3.1                hfb2fe0b_1    conda-forge
packaging                 24.1                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
parsley                   1.3                      pypi_0    pypi
parso                     0.8.4                    pypi_0    pypi
partd                     1.4.2                    pypi_0    pypi
pexpect                   4.9.0                    pypi_0    pypi
photutils                 1.13.0                   pypi_0    pypi
pillow                    10.4.0                   pypi_0    pypi
pip                       24.0               pyhd8ed1ab_0    conda-forge
pluggy                    1.5.0                    pypi_0    pypi
poppy                     1.1.1                    pypi_0    pypi
prompt-toolkit            3.0.47                   pypi_0    pypi
psutil                    6.0.0                    pypi_0    pypi
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.2                    pypi_0    pypi
pyerfa                    2.0.1.4                  pypi_0    pypi
pygments                  2.18.0                   pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
pysynphot                 2.0.0                    pypi_0    pypi
pytest                    8.2.2                    pypi_0    pypi
python                    3.10.14         h2469fbe_0_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyvo                      1.5.2                    pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h92ec313_1    conda-forge
referencing               0.35.1                   pypi_0    pypi
reproject                 0.13.1                   pypi_0    pypi
requests                  2.32.3                   pypi_0    pypi
rpds-py                   0.18.1                   pypi_0    pypi
scikit-image              0.24.0                   pypi_0    pypi
scipy                     1.14.0                   pypi_0    pypi
semantic-version          2.10.0                   pypi_0    pypi
setuptools                70.1.1             pyhd8ed1ab_0    conda-forge
setuptools-scm            8.1.0                    pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
snowballstemmer           2.2.0                    pypi_0    pypi
soupsieve                 2.5                      pypi_0    pypi
spherical-geometry        1.3.2                    pypi_0    pypi
sphinx                    7.3.7                    pypi_0    pypi
sphinxcontrib-applehelp   1.0.8                    pypi_0    pypi
sphinxcontrib-devhelp     1.0.6                    pypi_0    pypi
sphinxcontrib-htmlhelp    2.0.5                    pypi_0    pypi
sphinxcontrib-jsmath      1.0.1                    pypi_0    pypi
sphinxcontrib-qthelp      1.0.7                    pypi_0    pypi
sphinxcontrib-serializinghtml 1.1.10                   pypi_0    pypi
stack-data                0.6.3                    pypi_0    pypi
stcal                     1.7.2                    pypi_0    pypi
stdatamodels              1.10.1                   pypi_0    pypi
stpipe                    0.5.2                    pypi_0    pypi
stsci-image               2.3.9                    pypi_0    pypi
stsci-imagestats          1.6.3                    pypi_0    pypi
stsci-stimage             0.2.9                    pypi_0    pypi
svo-filters               0.4.4                    pypi_0    pypi
synphot                   1.4.0                    pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tifffile                  2024.7.2                 pypi_0    pypi
tk                        8.6.13               h5083fa2_1    conda-forge
tomli                     2.0.1                    pypi_0    pypi
toolz                     0.12.1                   pypi_0    pypi
tornado                   6.4.1                    pypi_0    pypi
tqdm                      4.66.4                   pypi_0    pypi
traitlets                 5.14.3                   pypi_0    pypi
tweakwcs                  0.8.7                    pypi_0    pypi
typing-extensions         4.12.2                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
uncertainties             3.2.1                    pypi_0    pypi
urllib3                   2.2.2                    pypi_0    pypi
wcwidth                   0.2.13                   pypi_0    pypi
webencodings              0.5.1                    pypi_0    pypi
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
wiimatch                  0.3.2                    pypi_0    pypi
xarray                    2024.6.0                 pypi_0    pypi
xz                        5.2.6                h57fd34a_0    conda-forge
zarr                      2.18.2                   pypi_0    pypi
zipp                      3.19.2                   pypi_0    pypi

Code of Conduct

Witchblade101 commented 3 months ago

As a follow-up, I tried running a different dataset and saving the output to a network disk that has TB of free space. I got essentially the same error:

Starting Channel 268 of 393

Using the following limb-darkening values:
u1, -0.02927
u2, 0.44406
=========================
Starting lsq fit.
Starting lnprob: -1008930.39282718

Verbose lsq results:  message: Optimization terminated successfully.
 success: True
  status: 0
     fun: 1008688.5593286802
       x: [ 7.406e-02  1.001e+00 -3.425e-03  1.033e+03]
     nit: 3
   direc: [[ 0.000e+00  0.000e+00  0.000e+00  1.000e+00]
           [ 0.000e+00  1.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  1.000e+00  0.000e+00]
           [ 1.142e-04  3.668e-06  1.323e-05 -6.143e-04]]
    nfev: 180

Ending lnprob: -1008688.5593286802
Reduced Chi-squared: 103.51799706001802

LSQ RESULTS:
rp: 0.07405848004589306
c0: 1.0011990524216912
c1: -0.0034245462251475587
scatter_ppm: 1033.2616961306856

Completed lsq fit.
-------------------------
Starting emcee fit.

Calling lsqfitter first...
Starting lnprob: -1008688.5593286802

Verbose lsq results:  message: Optimization terminated successfully.
 success: True
  status: 0
     fun: 1008688.5593286802
       x: [ 7.406e-02  1.001e+00 -3.425e-03  1.033e+03]
     nit: 1
   direc: [[ 1.000e+00  0.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  1.000e+00  0.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  1.000e+00  0.000e+00]
           [ 0.000e+00  0.000e+00  0.000e+00  1.000e+00]]
    nfev: 84

Ending lnprob: -1008688.5593286802
Reduced Chi-squared: 103.51799706001869

LSQ RESULTS:
rp: 0.07405848004589306
c0: 1.0011990524316912
c1: -0.0034245461265187587
scatter_ppm: 1033.2616961306856

No covariance matrix from LSQ - falling back on a step size based on the prior range
Starting lnprob: -1016523.8337508459
Traceback (most recent call last):
  File "/System/Volumes/Data/astro/jtste/doug/JWST/eureka/t1e/Obs2/run_eureka_2.py", line 31, in <module>
    s5_meta = s5.fitlc(eventlabel, ecf_path=ecf_path)
  File "/Users/dlong/Eureka/src/eureka/S5_lightcurve_fitting/s5_fit.py", line 482, in fitlc
    meta, params = fit_channel(meta, time_temp, flux, channel,
  File "/Users/dlong/Eureka/src/eureka/S5_lightcurve_fitting/s5_fit.py", line 990, in fit_channel
    lc_model.fit(model, meta, log, fitter='emcee')
  File "/Users/dlong/Eureka/src/eureka/S5_lightcurve_fitting/lightcurve.py", line 173, in fit
    fit_model = self.fitter_func(self, model, meta, log, **kwargs)
  File "/Users/dlong/Eureka/src/eureka/S5_lightcurve_fitting/fitters.py", line 330, in emceefitter
    pool = Pool(meta.ncpu)
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/pool.py", line 196, in __init__
    self._change_notifier = self._ctx.SimpleQueue()
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/context.py", line 113, in SimpleQueue
    return SimpleQueue(ctx=self.get_context())
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/queues.py", line 341, in __init__
    self._rlock = ctx.Lock()
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/synchronize.py", line 162, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
  File "/Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/synchronize.py", line 57, in __init__
    sl = self._semlock = _multiprocessing.SemLock(
OSError: [Errno 28] No space left on device
(eureka) dlong@hikaru Obs2 % /Users/dlong/miniconda3/envs/eureka/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 9980 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
kevin218 commented 3 months ago

Try turning off multiprocessing and rerunning the code. It may be crashing within that part of the code and we wouldn't know because of how multiprocessing works.

Assuming it does crash, try solving that issue and turning multiprocessing back on.

Witchblade101 commented 3 months ago

Makes sense. I'll give that a try.

taylorbell57 commented 3 months ago

@Witchblade101, my first recommendation was going to be that you consider deleting any ...FluxData.h5 save files output during your Stage 3 analyses; these files can sometimes be helpful when trying to debug your analyses or to compare your analyses against other pipelines, but they are quite large and can rapidly eat up your storage space. In the next (upcoming) Eureka! version release, we've set those FluxData files to not be created by default. There are many command-line options that you can use to search for and delete any existing ...FluxData.h5 save files, and in your Stage 3 ECF you can also set save_fluxdata to False to avoid future such files being made.

Second, there is nothing that we can do on our end to resolve this No space left on device error raised by your operating system. While there may be ~5 GB available on your system this is generally insufficient for many of the important tasks of your operating system and applications. In addition, when performing tasks that require more RAM than your computer physically has, the operating system will sometimes store that RAM data on your drive which is called "swap" memory. I strongly recommend you delete un-needed files or move them to an external hard drive or cloud storage. However, looking at your df -h output, it looks to me like you actually have ~500 GB free and not just ~5 GB; if that is the case, then this is a very strange error indeed. As for your more recent attempt with the network disk, this error could potentially still arise if you really did have just ~5GB available locally regardless of where you were saving the outputs; 5 GB of storage just isn't enough breathing room for most OSes.

It is very unusal to me that in Stage 5 you're getting all those CRDS - WARNING - Failed creating CRDS cache lock: [Errno 28] No space left on device messages. CRDS is only relevant for Stages 1-3, but its possible that with the multiprocessing turned on each spawned process has to re-import Eureka! and as a result tries to import the CRDS package and gets that cache lock error. I agree with Kevin that setting ncpu to 1 is the best way to troubleshoot this - troubleshooting with multiprocessing turned on is basically impossible because of how that package works. Once you have something that works, then you can try turning multiprocessing back on.

Finally, my last note would be that it appears your fits from your most recent message using lsq are exceptionally poor - you're getting deeply negative log-likelihoods and reduced chi-squared values that are nowhere near 1.0. This is highly unlikely to be the cause of the No space left on device error, but it is something that requires thorough investigation before you continue working on those fits. I strongly recommend investigating the quality of all fits using the lsq fitter to make sure you can get reasonable fits to your lightcurves before you move on to running the far more time-intensive (and storage consuming) emcee or dynesty samplers.

Witchblade101 commented 3 months ago

It looks like it is something related to multiprocessing. After switching ncpu to 1 I got no more CRDS warnings or crashes due to lack of available disk space.

Starting Channel 392 of 393

Using the following limb-darkening values: u1, 0.03088 u2, 0.29295

Starting lsq fit. Starting lnprob: 25306.134211625274

Verbose lsq results: message: Optimization terminated successfully. success: True status: 0 fun: -25311.01601500675 x: [ 6.993e-02 1.001e+00 1.067e-03 1.567e+00] nit: 2 direc: [[ 1.000e+00 0.000e+00 0.000e+00 0.000e+00] [ 0.000e+00 1.000e+00 0.000e+00 0.000e+00] [ 0.000e+00 0.000e+00 1.000e+00 0.000e+00] [ 0.000e+00 0.000e+00 0.000e+00 1.000e+00]] nfev: 94

Ending lnprob: 25311.01601500675 Reduced Chi-squared: 1.000409751911406

LSQ RESULTS: rp: 0.06992821433380171 c0: 1.0006942978011697 c1: 0.0010671513801134036 scatter_mult: 1.5666936233970443; 21259.428896367146 ppm

Completed lsq fit.

Starting emcee fit.

Calling lsqfitter first... Starting lnprob: 25311.01601500675

Verbose lsq results: message: Optimization terminated successfully. success: True status: 0 fun: -25311.016952630034 x: [ 6.980e-02 1.001e+00 1.056e-03 1.567e+00] nit: 1 direc: [[ 1.000e+00 0.000e+00 0.000e+00 0.000e+00] [ 0.000e+00 1.000e+00 0.000e+00 0.000e+00] [ 0.000e+00 0.000e+00 1.000e+00 0.000e+00] [ 0.000e+00 0.000e+00 0.000e+00 1.000e+00]] nfev: 62

Ending lnprob: 25311.016952630034 Reduced Chi-squared: 1.0004097205368443

LSQ RESULTS: rp: 0.06980393030064233 c0: 1.0006905172597063 c1: 0.0010556368532090882 scatter_mult: 1.5666935237694661; 21259.427544459293 ppm

No covariance matrix from LSQ - falling back on a step size based on the prior range Starting lnprob: 25289.035883316206 Running emcee burn-in... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:43<00:00, 23.09it/s] Finished writing to /Users/dlong/DataAnalysis/JWST/eureka/t1e/Obs2/Stage5/S5_2024-07-08_t1e_2_run2/ap6_bg7/S5_emcee_samples_ch392.h5 Ending lnprob: 25311.001227579614 Mean acceptance fraction: 0.591 WARNING: Unable to estimate the autocorrelation time! Reduced Chi-squared: 0.9983547591005765

EMCEE RESULTS: rp: 0.0694942999224945 (+0.0037493809183321986, -0.003776110785145406) c0: 1.000690491270263 (+0.00023669803976011927, -0.00024450649265395086) c1: 0.0008652933562115306 (+0.003972529621944243, -0.0039819843810715745) scatter_mult: 1.5683060739344559 (+0.010369781942364265, -0.010505495954588273); 21281.30922895871 (+140.71394597018502, -142.55553283185938) ppm

Completed emcee fit.

========================= Saving results

Total time (min): 315.01

taylorbell57 commented 3 months ago

Well I'm glad you were able to get that analysis to complete! After doing some searching online (StackExchange, StackOverflow, Reddit, etc.), it seems clear to me that there is no issue on our end but rather an issue with your system (with a full temporary directory, filenames that are too long, etc.). Since there's nothing we can do, I'm going to close this issue and I recommend you check some of the online forums I mentioned above for solutions that might work in your situation and/or post a question on such a forum if you cannot find an already published solution.