Closed moustakas closed 1 year ago
My initial thought was that the problem was related to using the Ceres solver, and so I added an option in #700 to not use Ceres. However, re-running the command above (with 32 cores---and maybe that's the issue?) will run for hours and just hang / not make any progress.
@dstndstn
Here's a MWE (for Perlmutter) in case you have time to look into this. (Note that the version of legacypipe
in the shifter container is pretty old, so you'll need to point to a more recent checkout of the repo; just change the $LEGACYPIPE_CODE_DIR
variable to point to one of your check-outs before running the script):
SHIFTER=docker:legacysurvey/legacyhalos:v1.2
shifterimg pull $SHIFTER
shifter --image $SHIFTER bash
export LEGACYPIPE_CODE_DIR=/global/homes/i/ioannis/code/git/legacypipe
export PYTHONPATH=$LEGACYPIPE_CODE_DIR/py:$PYTHONPATH
source /pscratch/sd/i/ioannis/NGC4631_GROUP/virgofilaments-env
Then:
python $LEGACYPIPE_CODE_DIR/py/legacypipe/runbrick.py --radec 190.5282741091857 32.54475159391664 \
--width 6353 --height 6353 --pixscale 0.262 --threads 1 --outdir /pscratch/sd/i/ioannis/NGC4631_GROUP \
--bands g,r,z --survey-dir /global/cfs/cdirs/cosmo/work/legacysurvey/dr9 --run north --skip-calibs \
--checkpoint /pscratch/sd/i/ioannis/NGC4631_GROUP/NGC4631_GROUP-custom-checkpoint.p \
--pickle="/pscratch/sd/i/ioannis/NGC4631_GROUP/NGC4631_GROUP-custom-%%(stage)s.p" --galex \
--fit-on-coadds --no-ivar-reweighting
And the output (either in an interactive/compute node or in a login node) I get is:
runbrick.py starting at 2023-04-13T13:42:17.071301
legacypipe git version: DR10.1.0-3-g75b6c288
Command-line args: ['/global/homes/i/ioannis/code/git/legacypipe/py/legacypipe/runbrick.py', '--radec', '190.5282741091857', '32.54475159391664', '--width', '6353', '--height', '6353', '--pixscale', '0.262', '--threads', '1', '--outdir', '/pscratch/sd/i/ioannis/NGC4631_GROUP', '--bands', 'g,r,z', '--survey-dir', '/global/cfs/cdirs/cosmo/work/legacysurvey/dr9', '--run', 'north', '--skip-calibs', '--checkpoint', '/pscratch/sd/i/ioannis/NGC4631_GROUP/NGC4631_GROUP-custom-checkpoint.p', '--pickle=/pscratch/sd/i/ioannis/NGC4631_GROUP/NGC4631_GROUP-custom-%%(stage)s.p', '--galex', '--fit-on-coadds', '--no-ivar-reweighting']
python /global/homes/i/ioannis/code/git/legacypipe/py/legacypipe/runbrick.py --radec 190.5282741091857 32.54475159391664 --width 6353 --height 6353 --pixscale 0.262 --threads 1 --outdir /pscratch/sd/i/ioannis/NGC4631_GROUP --bands g,r,z --survey-dir /global/cfs/cdirs/cosmo/work/legacysurvey/dr9 --run north --skip-calibs --checkpoint /pscratch/sd/i/ioannis/NGC4631_GROUP/NGC4631_GROUP-custom-checkpoint.p --pickle=/pscratch/sd/i/ioannis/NGC4631_GROUP/NGC4631_GROUP-custom-%%(stage)s.p --galex --fit-on-coadds --no-ivar-reweighting
Parsed RA,Dec 190.5282741091857 32.54475159391664
Starting process 253034 Wall: -0.00 s, CPU: -0.00 s, VmPeak: 556 MB, VmSize: 556 MB, VmRSS: 121 MB, VmData: 141 MB, maxrss: 0.124464 MB
Runstage writecat
Runstage galex_forced
Runstage wise_forced
Reading pickle /pscratch/sd/i/ioannis/NGC4631_GROUP/NGC4631_GROUP-custom-wise_forced.p
Running stage galex_forced
Running stage galex_forced at 2023-04-13T13:42:19.621945
Cut to 8 GALEX tiles
Reading GALEX tile NGA_NGC4631 band n
Reading GALEX tile GI1_047080_NGC4656 band n
Reading GALEX tile GI4_016001_CVNIDWA band n
Reading GALEX tile AIS_113_sg36 band n
Reading GALEX tile AIS_113_sg37 band n
Reading GALEX tile AIS_113_sg46 band n
Reading GALEX tile AIS_113_sg47 band n
Reading GALEX tile AIS_113_sg48 band n
F0413 13:56:59.081884 253034 block_sparse_matrix.cc:79] Check failed: num_nonzeros_ >= 0 (-1756951292 vs. 0)
*** Check failure stack trace: ***
@ 0x7f1f912e813d google::LogMessage::Fail()
@ 0x7f1f912e9fa3 google::LogMessage::SendToLog()
@ 0x7f1f912e7ccb google::LogMessage::Flush()
@ 0x7f1f912ea98e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f1f91589d47 ceres::internal::BlockSparseMatrix::BlockSparseMatrix()
@ 0x7f1f9157f307 ceres::internal::BlockJacobianWriter::CreateJacobian()
@ 0x7f1f9164e0bc ceres::internal::TrustRegionPreprocessor::Preprocess()
@ 0x7f1f91645f0f ceres::Solver::Solve()
@ 0x7f1f91647489 ceres::Solve()
@ 0x7f1f919e0407 _ZN17_INTERNALdb2d2e0222real_ceres_forced_photIfEEP7_objectS2_S2_iiii.A
@ 0x7f1f919dbe85 _wrap_ceres_forced_phot.A
@ 0x55b4292a49a8 ast_for_expr
@ 0x7f1f940687ff __Pyx_PyObject_Call
@ 0x7f1f9406202a __pyx_pf_7tractor_15ceres_optimizer_14CeresOptimizer_14_ceres_forced_photom
@ 0x7f1f94053a91 __pyx_pw_7tractor_15ceres_optimizer_14CeresOptimizer_15_ceres_forced_photom
@ 0x7f20185e01ee __Pyx_CyFunction_CallAsMethod
@ 0x55b429288361 os_DirEntry_is_symlink
@ 0x55b4292e7763 _PyObject_GenericGetAttrWithDict
@ 0x55b4292a4897 ast_for_expr
@ 0x7f1f94045892 __pyx_pw_7tractor_15ceres_optimizer_14CeresOptimizer_5_optimize_forcedphot_core
@ 0x7f20185e01ee __Pyx_CyFunction_CallAsMethod
@ 0x55b429288361 os_DirEntry_is_symlink
@ 0x55b4292e7763 _PyObject_GenericGetAttrWithDict
@ 0x55b4292a4897 ast_for_expr
@ 0x7f1f942ba31f __Pyx_PyObject_Call
@ 0x7f1f9428c1de __pyx_pf_7tractor_8optimize_9Optimizer_4forced_photometry
@ 0x7f1f94289987 __pyx_pw_7tractor_8optimize_9Optimizer_5forced_photometry
@ 0x7f20185e01ee __Pyx_CyFunction_CallAsMethod
@ 0x55b429288361 os_DirEntry_is_symlink
@ 0x55b4292e7763 _PyObject_GenericGetAttrWithDict
@ 0x55b4292a4897 ast_for_expr
@ 0x7f20185c6d86 __pyx_pw_7tractor_6engine_7Tractor_35optimize_forced_photometry
Aborted
I am building GALEX through WISE mosaics and running Tractor on a sample of very large (NGC-scale) galaxies and galaxy groups and I'm having issues with the last stage, forced GALEX photometry. Here's one example traceback.
@dstndstn I'd be grateful for any thoughts or suggestions you may have.