thomasorb / orcs

ORCS is an analysis engine for SITELLE spectral cubes.
GNU General Public License v3.0
9 stars 6 forks source link

fit_lines_in_region does not complete #4

Closed Lyalpha closed 6 years ago

Lyalpha commented 7 years ago

When running fit_lines_in_region, the process does not seem to complete and the threaded processes that are spawned become zombie quickly and I am left with 1 process running indefinitely until I interrupt (left for at least an hour while trying to fit only a few spaxels in a small region with only a single line).

Tried with different combinations of lines to fit, and values for snr_guess, nofilter, pos_def etc.

>>> from orcs.process import SpectralCube`
>>> from orcs.version import __version__ as orcsver
>>> from orb.version import __version__ as orbver
>>> print orcsver
>>> print orbver
2.2.3
3.1.2
cube = SpectralCube('data/SN3/NGC6946_SN3.merged.cm1.1.0.hdf5')
INFO| Data shape : (2048, 2064, 320)
INFO| Cube is in WAVENUMBER (cm-1)
INFO| Cube is CALIBRATED in wavenumber
>>> cube.fit_lines_in_region('region.reg',
                     ('Halpha',),
                     fmodel='sincgauss',
                     pos_def=['1'],
                     pos_cov=100,
                     sigma_def=['1'],
                     sigma_cov=1,
                     binning=1,
                     snr_guess="auto",
                     nofilter=True)
loading region: Shape : box ( Number(1369.6762),Number(1593.6944),Number(6),Number(6),Number(0) )

WARNING| /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:654: RuntimeWarning: invalid value encountered in greater

WARNING| /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:658: RuntimeWarning: invalid value encountered in divide

WARNING| /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:663: RuntimeWarning: invalid value encountered in greater

WARNING| /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:667: RuntimeWarning: invalid value encountered in divide

INFO| Number of spectra to fit: 36
WARNING| /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:1546: RuntimeWarning: invalid value encountered in sqrt

WARNING| /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orb_kernel-3.1.2-py2.7-linux-x86_64.egg/orb/utils/image.py:938: RuntimeWarning: Mean of empty slice
  return np.squeeze(np.nanmean(np.nanmean(im_view, axis=3), axis=1))

INFO| Init of the parallel processing server with 8 threads
INFO| 6 lines to fit

 [==========] [100%] [] [remains: 0.000 s]                                    

It remains like this, with 1 process using cpu, until I interrupt.

thomasorb commented 6 years ago

Dear @Lyalpha,

Strange :confused:

Is it working well on a single spectrum in the given region ?

Can you please try to run it again with the option debug set to True at the init of the spectral cube and paste the output here ?

cube = SpectralCube('data/SN3/NGC6946_SN3.merged.cm1.1.0.hdf5', debug=True)

It will give more information on the launched parallel processes.

Thank you !

thomasorb commented 6 years ago

Dear @Lyalpha, Are you using Jupyter ? If it is the case there is a bug in Jupyter which makes impossible to run two times a roi fit (with SpectralCube.fit_lines_in_region()) without restarting the kernel.

Tell me if it helps restarting the kernel before doing the region fit ?

Lyalpha commented 6 years ago

The same problem occurs on the first run in jupyter. I confirm this also happens if I run the same from a python interactive session.

When using the debug option on Spectral_Cube I get only the same output as I do without (as above).

thomasorb commented 6 years ago

Dear @Lyalpha,

Can you please try the version of the code on the next branch https://github.com/thomasorb/orcs/tree/next with a classic python session (non interactive) and a very basic script.Please post the input script and the the output (with debug set to True).

Lyalpha commented 6 years ago
#NGC6946_mwe.py
from orcs.process import SpectralCube

cube = SpectralCube('data/SN3/NGC6946_SN3.merged.cm1.1.0.hdf5', debug=True)
cube.fit_lines_in_region('region.reg',
                         ('Halpha',),
                         fmodel='sincgauss',
                         pos_def=['1'],
                         pos_cov=100,
                         sigma_def=['1'],
                         sigma_cov=1,
                         binning=1,
                         snr_guess="auto",
                         nofilter=True)
$ python NGC6946_mwe.py 
171221-11:58:34|core:4198:__init__|INFO> Data shape : (2048, 2064, 320)
171221-11:58:34|core:1559:reset_params|INFO> Cube is in WAVENUMBER (cm-1)
171221-11:58:34|core:1566:reset_params|INFO> Cube is CALIBRATED in wavenumber
 loading region: Shape : box ( Number(1369.6762),Number(1593.6944),Number(6),Number(6),Number(0) )

171221-11:58:34|fit:2624:_prepare_input_params|DEBUG> theta_orig 15.4218567014, theta_proj: 15.4218567014
171221-11:58:34|fit:2246:add_lines_model|DEBUG> key removed: theta_proj
171221-11:58:34|fit:2246:add_lines_model|DEBUG> key removed: filter_file_path
171221-11:58:34|fit:2246:add_lines_model|DEBUG> key removed: axis_corr_orig
171221-11:58:34|fit:2246:add_lines_model|DEBUG> key removed: zpd_index
171221-11:58:34|fit:2246:add_lines_model|DEBUG> key removed: signal_range
171221-11:58:34|fit:2246:add_lines_model|DEBUG> key removed: theta_orig
171221-11:58:34|fit:2246:add_lines_model|DEBUG> key removed: axis_corr_proj
171221-11:58:34|fit:2246:add_lines_model|DEBUG> key removed: apodization
171221-11:58:35|core:555:_fit_lines_in_region|INFO> Number of spectra to fit: 36
171221-11:58:35|core:1669:get_flux_uncertainty|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:1669: RuntimeWarning: invalid value encountered in sqrt

171221-11:58:35|nanfunctions:828:nanmean|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orb_kernel-3.1.2-py2.7-linux-x86_64.egg/orb/utils/image.py:938: RuntimeWarning: Mean of empty slice
  return np.squeeze(np.nanmean(np.nanmean(im_view, axis=3), axis=1))

171221-11:58:35|parallel:73:init_pp_server|INFO> Init of the parallel processing server with 8 threads
171221-11:58:35|core:1912:process_by_pixel|INFO> passed mapped kwargs : []
171221-11:58:36|core:1943:process_by_pixel|INFO> 6 rows to fit
171221-11:58:36|pp:468:submit|DEBUG> Task 0 submited, function='process_in_row'
 [=         ] [16%] [] [remains: unknown]171221-11:58:36|pp:468:submit|DEBUG> Task 1 submited, function='process_in_row'
 [===       ] [33%] [] [remains: unknown]171221-11:58:36|pp:468:submit|DEBUG> Task 2 submited, function='process_in_row'
 [=====     ] [50%] [] [remains: unknown]171221-11:58:36|pp:468:submit|DEBUG> Task 3 submited, function='process_in_row'
 [======    ] [66%] [] [remains: 0.013 s]171221-11:58:36|pp:468:submit|DEBUG> Task 4 submited, function='process_in_row'
 [========  ] [83%] [] [remains: 0.022 s]171221-11:58:36|pp:468:submit|DEBUG> Task 5 submited, function='process_in_row'
 [==========] [100%] [] [remains: 0.000 s]171221-11:58:38|pp:742:_run_local|DEBUG> Task 0 ended
171221-11:58:42|pp:742:_run_local|DEBUG> Task 2 ended
171221-11:58:45|pp:742:_run_local|DEBUG> Task 1 ended
171221-11:58:46|pp:742:_run_local|DEBUG> Task 3 ended
171221-11:58:48|pp:742:_run_local|DEBUG> Task 4 ended
171221-11:58:48|pp:742:_run_local|DEBUG> Task 5 ended

Then it hangs indefinitely with 1 process at 100% as before. Thanks.

thomasorb commented 6 years ago

Thank you @Lyalpha and Happy New Year !!

Are you sure you are using the last version of the 'next' branch. Because, I get much more output when I test this function at home, e.g.

180109-13:03:29|core:4198:__init__|INFO> Data shape : (2048, 2064, 840)
180109-13:03:29|core:1559:reset_params|INFO> Cube is in WAVENUMBER (cm-1)
180109-13:03:29|core:1566:reset_params|INFO> Cube is CALIBRATED in wavenumber
180109-13:03:29|wcs:2561:to_header|WARNING> /home/thomas/.virtualenvs/orb/local/lib/python2.7/site-packages/astropy/wcs/wcs.py:2561: AstropyWarning: Some non-standard WCS keywords were excluded: BP_0_0, BP_0_1, BP_0_2, A_3_0, B_3_0, BP_3_0, B_1_2, B_1_0, B_1_1, B_2_1, B_2_0, A_ORDER, B_0_3, B_0_2, B_0_1, B_0_0, BP_0_3, B_ORDER, BP_ORDER, BP_1_2, AP_ORDER, AP_3_0, A_1_1, A_1_0, BP_2_0, A_1_2, AP_2_1, AP_2_0, A_0_0, A_0_1, A_0_2, A_0_3, BP_1_1, BP_1_0, A_2_0, A_2_1, AP_1_0, AP_1_1, AP_1_2, BP_2_1, AP_0_1, AP_0_0, AP_0_3, AP_0_2 Use the ``relax`` kwarg to control this.
  AstropyWarning)

 loading region: Shape : circle ( Number(919),Number(893),Number(1) )

180109-13:03:29|fit:2624:_prepare_input_params|DEBUG> theta_orig 15.4447050003, theta_proj: 15.4447050003
180109-13:03:29|fit:2246:add_lines_model|DEBUG> key removed: theta_proj
180109-13:03:29|fit:2246:add_lines_model|DEBUG> key removed: filter_file_path
180109-13:03:29|fit:2246:add_lines_model|DEBUG> key removed: axis_corr_orig
180109-13:03:29|fit:2246:add_lines_model|DEBUG> key removed: zpd_index
180109-13:03:29|fit:2246:add_lines_model|DEBUG> key removed: signal_range
180109-13:03:29|fit:2246:add_lines_model|DEBUG> key removed: theta_orig
180109-13:03:29|fit:2246:add_lines_model|DEBUG> key removed: axis_corr_proj
180109-13:03:29|fit:2246:add_lines_model|DEBUG> key removed: apodization
180109-13:03:30|core:557:_fit_lines_in_region|INFO> Number of spectra to fit: 5
180109-13:03:30|core:1669:get_flux_uncertainty|WARNING> /home/thomas/Astro/Python/ORB/Orcs/orcs/core.py:1669: RuntimeWarning: invalid value encountered in sqrt
  noise_counts = np.sqrt(total_counts)

180109-13:03:30|nanfunctions:828:nanmean|WARNING> /home/thomas/Astro/Python/ORB/Orb/orb/utils/image.py:938: RuntimeWarning: Mean of empty slice
  return np.squeeze(np.nanmean(np.nanmean(im_view, axis=3), axis=1))

180109-13:03:30|parallel:73:init_pp_server|INFO> Init of the parallel processing server with 4 threads
180109-13:03:30|core:1912:process_by_pixel|INFO> passed mapped kwargs : []
180109-13:03:31|core:1943:process_by_pixel|INFO> 3 rows to fit
180109-13:03:31|pp:468:submit|DEBUG> Task 0 submited, function='process_in_row'
 [===       ] [33%] [] [remains: unknown]180109-13:03:31|pp:468:submit|DEBUG> Task 1 submited, function='process_in_row'
 [======    ] [66%] [] [remains: unknown]180109-13:03:31|pp:468:submit|DEBUG> Task 2 submited, function='process_in_row'
 [==========] [100%] [] [remains: unknown]180109-13:03:33|pp:742:_run_local|DEBUG> Task 2 ended
180109-13:03:33|core:2010:process_by_pixel|DEBUG> job time since submission: 2.22434806824 s
180109-13:03:33|core:2012:process_by_pixel|DEBUG> job submit time: 0.00112199783325 s
180109-13:03:33|core:2014:process_by_pixel|DEBUG> job load data time: 0.00107884407043 s
DEBUG:root:passed mapped kwargs: {}
DEBUG:root:transformed mapped kwargs: {}
DEBUG:root:first SNR guess computed from spectrum uncertainty: 9.93819671444
DEBUG:root:SNR guess: 9.93819671444
DEBUG:root:recomputed fwhm guess: 3.15772977874
DEBUG:root:last minute changed parameter fwhm_guess: 3.15772977874
DEBUG:root:fwhm guess: 3.15772977874
DEBUG:root:SNR guess: 7.87494523828
DEBUG:root:recomputed fwhm guess: 3.15772977874
DEBUG:root:last minute changed parameter fwhm_guess: 3.15772977874
DEBUG:root:fwhm guess: 3.15772977874
DEBUG:root:pure fit time: 0.431720972061 s
DEBUG:root:fit function time: 0.902490139008 s
DEBUG:root:velocity: [-508.5(3.8)]
DEBUG:root:broadening: [19.9(6.1)]
180109-13:03:33|core:2048:process_by_pixel|DEBUG> job time (whole loop): 2.4196600914 s
180109-13:03:34|pp:742:_run_local|DEBUG> Task 0 ended
180109-13:03:34|core:2010:process_by_pixel|DEBUG> job time since submission: 2.61827206612 s
180109-13:03:34|core:2012:process_by_pixel|DEBUG> job submit time: 0.00152087211609 s
180109-13:03:34|core:2014:process_by_pixel|DEBUG> job load data time: 0.0014979839325 s
DEBUG:root:passed mapped kwargs: {}
DEBUG:root:transformed mapped kwargs: {}
DEBUG:root:first SNR guess computed from spectrum uncertainty: 4.92388702523
DEBUG:root:SNR guess: 4.92388702523
DEBUG:root:recomputed fwhm guess: 3.15762288687
DEBUG:root:last minute changed parameter fwhm_guess: 3.15762288687
DEBUG:root:fwhm guess: 3.15762288687
DEBUG:root:SNR guess: 4.14286706504
DEBUG:root:recomputed fwhm guess: 3.15762288687
DEBUG:root:last minute changed parameter fwhm_guess: 3.15762288687
DEBUG:root:fwhm guess: 3.15762288687
DEBUG:root:pure fit time: 0.559398889542 s
DEBUG:root:fit function time: 1.32779502869 s
DEBUG:root:velocity: [-494.8(8.2)]
DEBUG:root:broadening: [36.4(9.3)]
180109-13:03:34|core:2048:process_by_pixel|DEBUG> job time (whole loop): 2.61877417564 s
180109-13:03:35|pp:742:_run_local|DEBUG> Task 1 ended
180109-13:03:35|core:2010:process_by_pixel|DEBUG> job time since submission: 3.80063199997 s
180109-13:03:35|core:2012:process_by_pixel|DEBUG> job submit time: 0.00117611885071 s
180109-13:03:35|core:2014:process_by_pixel|DEBUG> job load data time: 0.00114488601685 s
DEBUG:root:passed mapped kwargs: {}
DEBUG:root:transformed mapped kwargs: {}
DEBUG:root:first SNR guess computed from spectrum uncertainty: 4.22138719526
DEBUG:root:SNR guess: 4.22138719526
DEBUG:root:recomputed fwhm guess: 3.15767941623
DEBUG:root:last minute changed parameter fwhm_guess: 3.15767941623
DEBUG:root:fwhm guess: 3.15767941623
DEBUG:root:SNR guess: 3.50458670935
DEBUG:root:recomputed fwhm guess: 3.15767941623
DEBUG:root:last minute changed parameter fwhm_guess: 3.15767941623
DEBUG:root:fwhm guess: 3.15767941623
DEBUG:root:pure fit time: 0.710036039352 s
DEBUG:root:fit function time: 1.62766885757 s
DEBUG:root:velocity: [-510.8(8.0)]
DEBUG:root:broadening: [24(12)]
DEBUG:root:passed mapped kwargs: {}
DEBUG:root:transformed mapped kwargs: {}
DEBUG:root:first SNR guess computed from spectrum uncertainty: 8.29585544904
DEBUG:root:SNR guess: 8.29585544904
DEBUG:root:recomputed fwhm guess: 3.15767633281
DEBUG:root:last minute changed parameter fwhm_guess: 3.15767633281
DEBUG:root:fwhm guess: 3.15767633281
DEBUG:root:SNR guess: 6.95737079621
DEBUG:root:recomputed fwhm guess: 3.15767633281
DEBUG:root:last minute changed parameter fwhm_guess: 3.15767633281
DEBUG:root:fwhm guess: 3.15767633281
DEBUG:root:pure fit time: 0.225869894028 s
DEBUG:root:fit function time: 0.44571185112 s
DEBUG:root:velocity: [-501.0(5.4)]
DEBUG:root:broadening: [33.4(6.0)]
DEBUG:root:passed mapped kwargs: {}
DEBUG:root:transformed mapped kwargs: {}
DEBUG:root:first SNR guess computed from spectrum uncertainty: 11.6752057092
DEBUG:root:SNR guess: 11.6752057092
DEBUG:root:recomputed fwhm guess: 3.15767324939
DEBUG:root:last minute changed parameter fwhm_guess: 3.15767324939
DEBUG:root:fwhm guess: 3.15767324939
DEBUG:root:SNR guess: 9.51865280715
DEBUG:root:recomputed fwhm guess: 3.15767324939
DEBUG:root:last minute changed parameter fwhm_guess: 3.15767324939
DEBUG:root:fwhm guess: 3.15767324939
DEBUG:root:pure fit time: 0.213638067245 s
DEBUG:root:fit function time: 0.45786190033 s
DEBUG:root:velocity: [-503.7(3.7)]
DEBUG:root:broadening: [29.3(4.4)]
180109-13:03:35|core:2048:process_by_pixel|DEBUG> job time (whole loop): 3.80102992058 s
 [==========] [100%] [completed in 3.81 s]
180109-13:03:35|core:2358:write_maps|INFO> Writing height maps
180109-13:03:35|core:2380:write_maps|WARNING> param height is the same for all lines
180109-13:03:35|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.height.fits in 0.11 s 
180109-13:03:35|core:2358:write_maps|INFO> Writing height-err maps
180109-13:03:35|core:2380:write_maps|WARNING> param height-err is the same for all lines
180109-13:03:35|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.height-err.fits in 0.06 s 
180109-13:03:35|core:2358:write_maps|INFO> Writing amplitude maps
180109-13:03:35|core:2380:write_maps|WARNING> param amplitude is the same for all lines
180109-13:03:35|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.amplitude.fits in 0.06 s 
180109-13:03:35|core:2358:write_maps|INFO> Writing amplitude-err maps
180109-13:03:35|core:2380:write_maps|WARNING> param amplitude-err is the same for all lines
180109-13:03:35|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.amplitude-err.fits in 0.06 s 
180109-13:03:35|core:2358:write_maps|INFO> Writing velocity maps
180109-13:03:35|core:2380:write_maps|WARNING> param velocity is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.velocity.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing velocity-err maps
180109-13:03:36|core:2380:write_maps|WARNING> param velocity-err is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.velocity-err.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing fwhm maps
180109-13:03:36|core:2380:write_maps|WARNING> param fwhm is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.fwhm.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing fwhm-err maps
180109-13:03:36|core:2380:write_maps|WARNING> param fwhm-err is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.fwhm-err.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing sigma maps
180109-13:03:36|core:2380:write_maps|WARNING> param sigma is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.sigma.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing sigma-err maps
180109-13:03:36|core:2380:write_maps|WARNING> param sigma-err is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.sigma-err.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing flux maps
180109-13:03:36|core:2380:write_maps|WARNING> param flux is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.flux.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing flux-err maps
180109-13:03:36|core:2380:write_maps|WARNING> param flux-err is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.flux-err.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing logGBF maps
180109-13:03:36|core:2380:write_maps|WARNING> param logGBF is the same for all lines
180109-13:03:36|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.logGBF.fits in 0.06 s 
180109-13:03:36|core:2358:write_maps|INFO> Writing chi2 maps
180109-13:03:36|core:2380:write_maps|WARNING> param chi2 is the same for all lines
180109-13:03:37|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.chi2.fits in 0.07 s 
180109-13:03:37|core:2358:write_maps|INFO> Writing rchi2 maps
180109-13:03:37|core:2380:write_maps|WARNING> param rchi2 is the same for all lines
180109-13:03:37|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.rchi2.fits in 0.06 s 
180109-13:03:37|core:2358:write_maps|INFO> Writing ks_pvalue maps
180109-13:03:37|core:2380:write_maps|WARNING> param ks_pvalue is the same for all lines
180109-13:03:37|io:230:write_fits|INFO> Data written as M31_SN3.1.0.ORCS/MAPS/M31_SN3.1.0.LineMaps.map.all.1x1.ks_pvalue.fits in 0.06 s

Also, I can't reproduce the problem. And, as far as I know you're the only one to report this problem. Please verify the version of the code you are using.

Thanks.

thomasorb commented 6 years ago

@Lyalpha: Sorry I forgot to mention that both ORB and ORCS modules should be switched to next branch for testing. Thank you !

Lyalpha commented 6 years ago
(astroconda2) [jdl@astro orb]$ git pull
....
(astroconda2) [jdl@astro orb]$ git status
On branch next
Your branch is up-to-date with 'origin/next'.
(astroconda2) [jdl@astro orb]$ python setup.py build_ext
(astroconda2) [jdl@astro orb]$ python setup.py install
(astroconda2) [jdl@astro orb]$ cd ../orcs
(astroconda2) [jdl@astro orcs]$ git pull
....
(astroconda2) [jdl@astro orcs]$ git status
On branch next
Your branch is up-to-date with 'origin/next'.
(astroconda2) [jdl@astro orb]$ python setup.py install

Actually, it looks like we get the same output up until when the program gets stuck on my machine. Then of course you see all the extra lines whereas mine halts. I presume you added some more debug info in recent commits as I get a few more lines now, but it still sticks at the same place (see below).

Is it worth you trying to run it on the cube I am using, if I host it somewhere?

In [16]: cube.fit_lines_in_region('region.reg',
    ...:                          ('Halpha',),
    ...:                          fmodel='sincgauss',
    ...:                          pos_def=['1'],
    ...:                          pos_cov=100,
    ...:                          sigma_def=['1'],
    ...:                          sigma_cov=1,
    ...:                          binning=1,
    ...:                          snr_guess="auto",
    ...:                          nofilter=True)
 loading region: Shape : box ( Number(1369.6762),Number(1593.6944),Number(6),Number(6),Number(0) )

180115-11:44:35|fit:2637:_prepare_input_params|DEBUG> theta_orig 15.4218567014, theta_proj: 15.4218567014
180115-11:44:35|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter pos_cov: [100]
180115-11:44:35|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter sigma_def: ['1']
180115-11:44:35|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter sigma_cov: [1]
180115-11:44:35|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter pos_def: ['1']
180115-11:44:35|fit:2073:_check_lines_params|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orb_kernel-3.1.2-py2.7-linux-x86_64.egg/orb/fit.py:2073: UserWarning: please set a guess, or a covarying value of sigma > 0 or use a  sinc model or you might end up with nans
  warnings.warn('please set a guess, or a covarying value of sigma > 0 or use a  sinc model or you might end up with nans')

180115-11:44:35|fit:2264:add_lines_model|DEBUG> key removed: theta_proj
180115-11:44:35|fit:2264:add_lines_model|DEBUG> key removed: filter_file_path
180115-11:44:35|fit:2264:add_lines_model|DEBUG> key removed: axis_corr_orig
180115-11:44:35|fit:2264:add_lines_model|DEBUG> key removed: zpd_index
180115-11:44:35|fit:2264:add_lines_model|DEBUG> key removed: signal_range
180115-11:44:35|fit:2264:add_lines_model|DEBUG> key removed: theta_orig
180115-11:44:35|fit:2264:add_lines_model|DEBUG> key removed: axis_corr_proj
180115-11:44:35|fit:2264:add_lines_model|DEBUG> key removed: apodization
180115-11:44:36|core:560:_fit_lines_in_region|INFO> Number of spectra to fit: 36
180115-11:44:36|core:1673:get_flux_uncertainty|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:1673: RuntimeWarning: invalid value encountered in sqrt

180115-11:44:36|nanfunctions:828:nanmean|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orb_kernel-3.1.2-py2.7-linux-x86_64.egg/orb/utils/image.py:938: RuntimeWarning: Mean of empty slice
  return np.squeeze(np.nanmean(np.nanmean(im_view, axis=3), axis=1))

180115-11:44:36|parallel:73:init_pp_server|INFO> Init of the parallel processing server with 8 threads
180115-11:44:36|core:1955:process_by_pixel|INFO> passed mapped kwargs : []
180115-11:44:37|core:1986:process_by_pixel|INFO> 6 rows to fit
180115-11:44:37|pp:468:submit|DEBUG> Task 0 submited, function='process_in_row'
 [=         ] [16%] [] [remains: unknown]180115-11:44:37|pp:468:submit|DEBUG> Task 1 submited, function='process_in_row'
 [===       ] [33%] [] [remains: unknown]180115-11:44:37|pp:468:submit|DEBUG> Task 2 submited, function='process_in_row'
 [=====     ] [50%] [] [remains: unknown]180115-11:44:37|pp:468:submit|DEBUG> Task 3 submited, function='process_in_row'
 [======    ] [66%] [] [remains: 0.030 s]180115-11:44:37|pp:468:submit|DEBUG> Task 4 submited, function='process_in_row'
 [========  ] [83%] [] [remains: 0.018 s]180115-11:44:37|pp:468:submit|DEBUG> Task 5 submited, function='process_in_row'
 [==========] [100%] [] [remains: 0.000 s]180115-11:44:39|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 2.51130789636
180115-11:44:39|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 4.09642436778
180115-11:44:39|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 3.44846742141
180115-11:44:39|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 4.03118667032
180115-11:44:39|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 2.52463886121
180115-11:44:39|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 3.31583541226
180115-11:44:39|pp:742:_run_local|DEBUG> Task 3 ended
180115-11:44:39|pp:742:_run_local|DEBUG> Task 1 ended
180115-11:44:40|pp:742:_run_local|DEBUG> Task 0 ended
180115-11:44:40|pp:742:_run_local|DEBUG> Task 2 ended
180115-11:44:40|pp:742:_run_local|DEBUG> Task 5 ended
180115-11:44:41|pp:742:_run_local|DEBUG> Task 4 ended
thomasorb commented 6 years ago

Dear @Lyalpha ! I tried it at home with your script (see below) and your cube. It worked perfectly for me :'( ...

from orcs.process import SpectralCube

cube = SpectralCube('NGC6946_SN3.merged.cm1.1.0.hdf5', debug=True)
cube.fit_lines_in_region('region.reg',
                         ('Halpha',),
                         fmodel='sincgauss',
                         pos_def=['1'],
                         pos_cov=100,
                         sigma_def=['1'],
                         sigma_cov=1,
                         binning=1,
                         snr_guess="auto",
                         nofilter=True)

The region file (region.reg) I used is

# Region file format: DS9 version 4.1
global color=green dashlist=8 3 width=1 font="helvetica 10 normal roman" select=1 highlite=1 dash=0 fixed=0 edit=1 move=1 delete=1 include=1 source=1
image
circle(1000,1000,10)

It may come from the installed packages (especially pp - parallel python) or the os ... hum what's the os ? did you try to compile pp yourself, it might help.

thomasorb commented 6 years ago

Other people have reported never completing processes. As I still didn't find a way to reproduce this issue I have added a bunch of timeout checks in the function CubeJobServer.process_by_pixel in this commit (next branch: https://github.com/thomasorb/orcs/commit/c470503400328d92bc7c201945a6ec27eb21530d, ORB has also been modified and must be updated at least to https://github.com/thomasorb/orb/commit/8f454cd679fea7a741bc0e0c5a74a36d17fc4540 on its next branch)

For speed reason I cannot control the timeout per pixel but only per row. So that if a row times out the entire row is filled with nans. But at least it will help debugging.

To use a timeout with fit_lines_in_region just do:

cube.fit_lines_in_region(..., timeout=1) # will put a mean timeout of 1s per pixel

This mean timeout is considered for a row and the entire process also. i.e. if the entire process targets 1e4 pixels, the total time must not be more than 1e4 s with a timeout per pixel of 1s. If a given row contains 50 pixels to fit, its processing time must not be higher than 50 s with a 1s timeout per pixel.

I also added a fixed timeout of 10s on data read from the HDF5 file. Because I suspect that the problems might come from hanging I/O on the HDF5 file.

Use it in debug mode to get all the necessary output on which process has timed out and when.

simon-prunet commented 6 years ago

Here is a simple script that does not end well...

#!/usr/bin/env python

from orcs.process import SpectralCube
import pylab as pl
import orb.utils.io
import numpy as np

#import pydevd
#pydevd.settrace('localhost', port=7777, stdoutToServer=True, stderrToServer=True)

# cube = SpectralCube('/h/prunet/Orion/Orion-A_SN3.merged.cm1.1.0.hdf5', debug=True)
cube = SpectralCube('/local/data2/sitelle/orb/17BQ08/17BE95/Orion/Orion-A_SN3.merged.cm1.1.0.hdf5', debug=True)
# axis, spectrum, fit = cube.fit_lines_in_spectrum(733, 886, 3,
#                                                  ('[NII]6548', 'Halpha', '[NII]6583', '[SII]6716', '[SII]6731'),
#                                                  fmodel='sincgauss',
#                                                  pos_def=['1','2','1','1','1'],
#                                                  pos_cov=0,
#                                                  sigma_def=['1','2','1','1','1'],
#                                                  sigma_cov=20,
#                                                  snr_guess=None,
#                                                  no_filter=True)
cube.fit_lines_in_region('/h/prunet/Orion/small.reg',
                            ('[NII]6548', 'Halpha', '[NII]6583', '[SII]6716', '[SII]6731'),
                            fmodel='sinc',
                            pos_def=['1','2','1','1','1'],
                            pos_cov=[0,0],
                            snr_guess=None,
                            timeout=1)

Here is the ds9 reg file content:

# Region file format: DS9 version 4.1
global color=green dashlist=8 3 width=1 font="helvetica 10 normal roman" select=1 highlite=1 dash=0 fixed=0 edit=1 move=1 delete=1 include=1 source=1
image
box(918.00224,1034.2442,10.50944,10.8874,0)

Here is the output before it hangs:

prunet@iolani:~/ORBS/orb-orcs-code/orcs$ ./mytest
/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
180117-12:03:21|core:371:listen|DEBUG> logging listener started
180117-12:03:21|core:4342:__init__|INFO> Data shape : (2048, 2064, 287)
180117-12:03:21|core:1569:reset_params|INFO> Cube is in WAVENUMBER (cm-1)
180117-12:03:21|core:1576:reset_params|INFO> Cube is CALIBRATED in wavenumber
 loading region: Shape : box ( Number(918.00224),Number(1034.2442),Number(10.50944),Number(10.8874),Number(0) )

180117-12:03:21|fit:2637:_prepare_input_params|DEBUG> theta_orig 15.423632909, theta_proj: 15.423632909
180117-12:03:21|fit:2264:add_lines_model|DEBUG> key removed: theta_proj
180117-12:03:21|fit:2264:add_lines_model|DEBUG> key removed: signal_range
180117-12:03:21|fit:2264:add_lines_model|DEBUG> key removed: filter_file_path
180117-12:03:21|fit:2264:add_lines_model|DEBUG> key removed: theta_orig
180117-12:03:21|fit:2264:add_lines_model|DEBUG> key removed: axis_corr_orig
180117-12:03:21|fit:2264:add_lines_model|DEBUG> key removed: zpd_index
180117-12:03:21|fit:2264:add_lines_model|DEBUG> key removed: axis_corr_proj
180117-12:03:21|fit:2264:add_lines_model|DEBUG> key removed: apodization
180117-12:03:23|core:563:_fit_lines_in_region|INFO> Number of spectra to fit: 121
180117-12:03:23|core:1679:get_flux_uncertainty|WARNING> /h/prunet/ORBS/orb-orcs-code/orcs/orcs/core.py:1679: RuntimeWarning: invalid value encountered in sqrt
  noise_counts = np.sqrt(total_counts)

180117-12:03:24|nanfunctions:868:nanmean|WARNING> /h/prunet/ORBS/orb-orb-code/orb/utils/image.py:938: RuntimeWarning: Mean of empty slice
  return np.squeeze(np.nanmean(np.nanmean(im_view, axis=3), axis=1))

180117-12:03:24|core:1775:__init__|DEBUG> debug set to True
180117-12:03:26|parallel:74:init_pp_server|INFO> Init of the parallel processing server with 16 threads
180117-12:03:26|core:2031:process_by_pixel|INFO> passed mapped kwargs : []
180117-12:03:27|core:2063:process_by_pixel|INFO> 11 rows to fit
 [==        ] [27%] [] [remains: unknown]180117-12:03:28|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180117-12:03:28|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180117-12:03:28|utils:94:fit_lines_in_spectrum|DEBUG> SNR guess: None
 [===       ] [36%] [] [remains: 2.57 s]180117-12:03:28|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180117-12:03:28|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180117-12:03:28|utils:94:fit_lines_in_spectrum|DEBUG> SNR guess: None
 [====      ] [45%] [] [remains: 2.10 s]180117-12:03:28|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180117-12:03:28|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180117-12:03:28|utils:94:fit_lines_in_spectrum|DEBUG> SNR guess: None
 [==========] [100%] [] [remains: 0.000 s]                              

prunet@iolani:~/ORBS/orb-orcs-code/orcs$ uname -a Linux iolani 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 GNU/Linux

thomasorb commented 6 years ago

Sadly, it works at home ... It may come from the version of the modules you are using: Can you please try something like this and check if we are at the same version of all these modules.

import gvar
print gvar.__version__

import numpy
print numpy.__version__

import scipy
print scipy.__version__

import astropy
print astropy.__version__

import h5py
print h5py.__version__

import lsqfit
print lsqfit.__version__

import pp
print pp.version

import pyregion
print pyregion.__version__

import cython
print cython.__version__

I obtain

8.3.3
1.13.3
1.0.0
2.0.2
2.7.1
9.1.3
1.6.5
2.0
0.27.3

Note that you can upgrade quickly most of the important library by using the file requirement.txt in https://github.com/thomasorb/orb and use the following command:

pip install -r requirements.txt --upgrade

thomasorb commented 6 years ago

You can also check that the compiled code written in cython has been properly compile for you architecture. In root orb folder just run the following command: python setup.py build_ext --inplace

thomasorb commented 6 years ago

Can you please give me the output of uname -a. Here I have Linux computer 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:14:41 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

thomasorb commented 6 years ago

Could you please try this

from orcs.process import SpectralCube
from orcs.core import CubeJobServer
import orb.utils.io as io
import time
import numpy as np

binning = 1
cube = SpectralCube('/home/thomas/M31_SN3.merged.cm1.1.0.hdf5', debug=True)
cubeserver = CubeJobServer(cube)

def func(spectrum, _):
    #import time
    #time.sleep(_t)                                                                                                                                                                                                
    return np.nansum(spectrum)

mask = np.zeros((cube.dimx, cube.dimy), dtype=bool)
mask[1000:1010, 1000:1010] = True

out = np.empty_like(mask, dtype=float)
out.fill(np.nan)

out = cubeserver.process_by_pixel(
    func, out=out, mask=mask, modules=('import numpy as np', ),
    binning=binning,
    timeout=None)

io.write_fits('out.fits', out, overwrite=True)
Lyalpha commented 6 years ago

Before I had versions

8.3.3
1.13.1
0.19.1
2.0.1
2.7.1
9.1.3
1.6.5
2.0.dev306
0.27.3

I did an --upgrade to:

8.3.3
1.14.0
1.0.0
2.0.3
2.7.1
9.1.3
1.6.5
2.0
0.27.3

and got the same problem.

Adding in a timeout I get:

In [3]: cube = SpectralCube('data/SN3/NGC6946_SN3.merged.cm1.1.0.hdf5', debug=True)
   ...: cube.fit_lines_in_region('region.reg',
   ...:                          ('Halpha',),
   ...:                          fmodel='sincgauss',
   ...:                          pos_def=['1'],
   ...:                          pos_cov=100,
   ...:                          sigma_def=['1'],
   ...:                          sigma_cov=1,
   ...:                          binning=1,
   ...:                          snr_guess="auto",
   ...:                          nofilter=True,
   ...: timeout=1)
   ...: 
180118-11:44:37|core:371:listen|DEBUG> logging listener started
180118-11:44:37|core:242:__init__|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orb_kernel-3.1.2-py2.7-linux-x86_64.egg/orb/core.py:242: UserWarning: Exception occured during logging server init (maybe it is already initialized): [Errno 98] Address already in use
  warnings.warn('Exception occured during logging server init (maybe it is already initialized): {}'.format(e))

180118-11:44:37|core:4342:__init__|INFO> Data shape : (2048, 2064, 320)
180118-11:44:37|core:1569:reset_params|INFO> Cube is in WAVENUMBER (cm-1)
180118-11:44:37|core:1576:reset_params|INFO> Cube is CALIBRATED in wavenumber
 loading region: Shape : box ( Number(1369.6762),Number(1593.6944),Number(6),Number(6),Number(0) )

180118-11:44:38|fit:2637:_prepare_input_params|DEBUG> theta_orig 15.4218567014, theta_proj: 15.4218567014
180118-11:44:38|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter pos_cov: [100]
180118-11:44:38|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter sigma_def: ['1']
180118-11:44:38|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter sigma_cov: [1]
180118-11:44:38|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter pos_def: ['1']
180118-11:44:38|fit:2264:add_lines_model|DEBUG> key removed: theta_proj
180118-11:44:38|fit:2264:add_lines_model|DEBUG> key removed: filter_file_path
180118-11:44:38|fit:2264:add_lines_model|DEBUG> key removed: axis_corr_orig
180118-11:44:38|fit:2264:add_lines_model|DEBUG> key removed: zpd_index
180118-11:44:38|fit:2264:add_lines_model|DEBUG> key removed: signal_range
180118-11:44:38|fit:2264:add_lines_model|DEBUG> key removed: theta_orig
180118-11:44:38|fit:2264:add_lines_model|DEBUG> key removed: axis_corr_proj
180118-11:44:38|fit:2264:add_lines_model|DEBUG> key removed: apodization
180118-11:44:39|core:563:_fit_lines_in_region|INFO> Number of spectra to fit: 36
180118-11:44:39|core:1775:__init__|DEBUG> debug set to True
180118-11:44:39|parallel:74:init_pp_server|INFO> Init of the parallel processing server with 8 threads
180118-11:44:40|core:2031:process_by_pixel|INFO> passed mapped kwargs : []
180118-11:44:40|core:2063:process_by_pixel|INFO> 6 rows to fit
 [========  ] [83%] [] [remains: 0.265 s]180118-11:44:41|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:41|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:41|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 2.51130789636
 [==========] [100%] [] [remains: 0.000 s]180118-11:44:42|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:42|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:42|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 3.44846742141
180118-11:44:42|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:42|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:42|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 4.09642436778
180118-11:44:42|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:42|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:42|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 4.03118667032
180118-11:44:42|utils:94:fit_lines_in_spectrum|DEBUG> SNR guess: 4.03118667032
180118-11:44:42|fit:124:__init__|DEBUG> max iteration changed to 1000
180118-11:44:42|fit:1415:parse_param|DEBUG> Covarying symbols for pos: ['pos_def1']
180118-11:44:43|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:43|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:43|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 3.31583541226
180118-11:44:43|utils:94:fit_lines_in_spectrum|DEBUG> SNR guess: 3.31583541226
180118-11:44:43|fit:124:__init__|DEBUG> max iteration changed to 1000
180118-11:44:43|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:43|core:456:fit_lines_in_pixel|DEBUG> passed mapped kwargs: {}
180118-11:44:43|utils:81:fit_lines_in_spectrum|DEBUG> first SNR guess computed from spectrum uncertainty: 2.52463886121
180118-11:44:43|utils:108:fit_lines_in_spectrum|DEBUG> recomputed fwhm guess: [8.395985472792058]
180118-11:44:43|fit:1415:parse_param|DEBUG> Covarying symbols for pos: ['pos_def1']
180118-11:44:46|core:2205:process_by_pixel|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:2205: UserWarning: job 0 timeout for pixels [1366 1367 1368 1369 1370 1371], 1590

180118-11:44:46|core:2206:process_by_pixel|INFO> ncpus: 8,  njobs: 6, rworker: None, time: 17.8524928093
180118-11:44:46|core:2205:process_by_pixel|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:2205: UserWarning: job 1 timeout for pixels [1366 1367 1368 1369 1370 1371], 1591

180118-11:44:46|core:2206:process_by_pixel|INFO> ncpus: 8,  njobs: 6, rworker: None, time: 17.8524928093
180118-11:44:47|core:2205:process_by_pixel|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:2205: UserWarning: job 2 timeout for pixels [1366 1367 1368 1369 1370 1371], 1592

180118-11:44:47|core:2206:process_by_pixel|INFO> ncpus: 8,  njobs: 6, rworker: None, time: 17.8524928093
180118-11:44:47|core:2205:process_by_pixel|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:2205: UserWarning: job 3 timeout for pixels [1366 1367 1368 1369 1370 1371], 1593

180118-11:44:47|core:2206:process_by_pixel|INFO> ncpus: 8,  njobs: 6, rworker: None, time: 17.8524928093
180118-11:44:47|core:2205:process_by_pixel|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:2205: UserWarning: job 4 timeout for pixels [1366 1367 1368 1369 1370 1371], 1594

180118-11:44:47|core:2206:process_by_pixel|INFO> ncpus: 8,  njobs: 6, rworker: None, time: 17.8524928093
180118-11:44:47|core:2205:process_by_pixel|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/orcs-2.2.3-py2.7.egg/orcs/core.py:2205: UserWarning: job 5 timeout for pixels [1366 1367 1368 1369 1370 1371], 1595

180118-11:44:47|core:2206:process_by_pixel|INFO> ncpus: 8,  njobs: 6, rworker: None, time: 17.8524928093
 [==========] [100%] [completed in 7.3 s]                                     
180118-11:44:47|parallel:87:close_pp_server|DEBUG> ncpus: 8,  njobs: 6, rworker: None, time: 17.8524928093
180118-11:44:47|core:2516:write_maps|INFO> Writing height maps
180118-11:44:47|core:2538:write_maps|WARNING> param height is the same for all lines
180118-11:44:47|wcs:2561:to_header|WARNING> /home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/astropy/wcs/wcs.py:2561: AstropyWarning: Some non-standard WCS keywords were excluded: BP_0_0, BP_0_1, BP_0_2, A_3_0, B_3_0, BP_3_0, B_1_2, B_1_0, B_1_1, B_2_1, B_2_0, A_ORDER, B_0_3, B_0_2, B_0_1, B_0_0, BP_0_3, B_ORDER, BP_ORDER, BP_1_2, AP_ORDER, AP_3_0, A_1_1, A_1_0, BP_2_0, A_1_2, AP_2_1, AP_2_0, A_0_0, A_0_1, A_0_2, A_0_3, BP_1_1, BP_1_0, A_2_0, A_2_1, AP_1_0, AP_1_1, AP_1_2, BP_2_1, AP_0_1, AP_0_0, AP_0_3, AP_0_2 Use the ``relax`` kwarg to control this.
  AstropyWarning)

180118-11:44:48|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.height.fits in 0.16 s 
180118-11:44:48|core:2516:write_maps|INFO> Writing height-err maps
180118-11:44:48|core:2538:write_maps|WARNING> param height-err is the same for all lines
180118-11:44:48|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.height-err.fits in 0.16 s 
180118-11:44:48|core:2516:write_maps|INFO> Writing amplitude maps
180118-11:44:48|core:2538:write_maps|WARNING> param amplitude is the same for all lines
180118-11:44:48|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.amplitude.fits in 0.14 s 
180118-11:44:48|core:2516:write_maps|INFO> Writing amplitude-err maps
180118-11:44:48|core:2538:write_maps|WARNING> param amplitude-err is the same for all lines
180118-11:44:51|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.amplitude-err.fits in 2.73 s 
180118-11:44:51|core:2516:write_maps|INFO> Writing velocity maps
180118-11:44:51|core:2538:write_maps|WARNING> param velocity is the same for all lines
180118-11:44:51|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.velocity.fits in 0.15 s 
180118-11:44:51|core:2516:write_maps|INFO> Writing velocity-err maps
180118-11:44:51|core:2538:write_maps|WARNING> param velocity-err is the same for all lines
180118-11:44:51|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.velocity-err.fits in 0.15 s 
180118-11:44:51|core:2516:write_maps|INFO> Writing fwhm maps
180118-11:44:51|core:2538:write_maps|WARNING> param fwhm is the same for all lines
180118-11:44:51|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.fwhm.fits in 0.15 s 
180118-11:44:51|core:2516:write_maps|INFO> Writing fwhm-err maps
180118-11:44:51|core:2538:write_maps|WARNING> param fwhm-err is the same for all lines
180118-11:44:51|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.fwhm-err.fits in 0.17 s 
180118-11:44:51|core:2516:write_maps|INFO> Writing sigma maps
180118-11:44:51|core:2538:write_maps|WARNING> param sigma is the same for all lines
180118-11:44:52|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.sigma.fits in 0.15 s 
180118-11:44:52|core:2516:write_maps|INFO> Writing sigma-err maps
180118-11:44:52|core:2538:write_maps|WARNING> param sigma-err is the same for all lines
180118-11:44:52|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.sigma-err.fits in 0.15 s 
180118-11:44:52|core:2516:write_maps|INFO> Writing flux maps
180118-11:44:52|core:2538:write_maps|WARNING> param flux is the same for all lines
180118-11:44:52|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.flux.fits in 0.15 s 
180118-11:44:52|core:2516:write_maps|INFO> Writing flux-err maps
180118-11:44:52|core:2538:write_maps|WARNING> param flux-err is the same for all lines
180118-11:44:52|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.flux-err.fits in 0.15 s 
180118-11:44:52|core:2516:write_maps|INFO> Writing logGBF maps
180118-11:44:52|core:2538:write_maps|WARNING> param logGBF is the same for all lines
180118-11:44:52|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.logGBF.fits in 0.15 s 
180118-11:44:52|core:2516:write_maps|INFO> Writing chi2 maps
180118-11:44:52|core:2538:write_maps|WARNING> param chi2 is the same for all lines
180118-11:44:52|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.chi2.fits in 0.15 s 
180118-11:44:52|core:2516:write_maps|INFO> Writing rchi2 maps
180118-11:44:52|core:2538:write_maps|WARNING> param rchi2 is the same for all lines
180118-11:44:53|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.rchi2.fits in 0.15 s 
180118-11:44:53|core:2516:write_maps|INFO> Writing ks_pvalue maps
180118-11:44:53|core:2538:write_maps|WARNING> param ks_pvalue is the same for all lines
180118-11:44:53|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.ks_pvalue.fits in 0.15 s 
180118-11:44:53|parallel:87:close_pp_server|DEBUG> ncpus: 8,  njobs: 6, rworker: None, time: 17.8524928093

Checking a couple of the NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.*.fits files shows they are all-nan arrays.

Running your script above gives:

180118-11:55:37|core:371:listen|DEBUG> logging listener started
180118-11:55:37|core:4342:__init__|INFO> Data shape : (2048, 2064, 320)
180118-11:55:37|core:1569:reset_params|INFO> Cube is in WAVENUMBER (cm-1)
180118-11:55:37|core:1576:reset_params|INFO> Cube is CALIBRATED in wavenumber
180118-11:55:37|core:1775:__init__|DEBUG> debug set to True
180118-11:55:37|parallel:74:init_pp_server|INFO> Init of the parallel processing server with 8 threads
180118-11:55:37|core:2031:process_by_pixel|INFO> passed mapped kwargs : []
180118-11:55:37|core:2063:process_by_pixel|INFO> 10 rows to fit
 [========  ] [80%] [] [remains: 0.089 s]180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 0 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1000 1000 1000 1000 1000 1000 1000 1000 1000 1000]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 0 time since submission: 0.443104982376 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 0 submit time: 0.0446500778198 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 0 load data time: 0.0446288585663 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 0 time (whole loop): 0.444154024124 s
180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 1 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1001 1001 1001 1001 1001 1001 1001 1001 1001 1001]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 1 time since submission: 0.37335395813 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 1 submit time: 0.0713620185852 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 1 load data time: 0.0713291168213 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 1 time (whole loop): 0.374258995056 s
180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 2 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1002 1002 1002 1002 1002 1002 1002 1002 1002 1002]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 2 time since submission: 0.304619073868 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 2 submit time: 0.0696310997009 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 2 load data time: 0.0695469379425 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 2 time (whole loop): 0.305505990982 s
180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 3 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1003 1003 1003 1003 1003 1003 1003 1003 1003 1003]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 3 time since submission: 0.243687152863 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 3 submit time: 0.0618329048157 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 3 load data time: 0.0617921352386 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 3 time (whole loop): 0.244544029236 s
180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 4 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1004 1004 1004 1004 1004 1004 1004 1004 1004 1004]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 4 time since submission: 0.175303936005 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 4 submit time: 0.068473815918 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 4 load data time: 0.0683891773224 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 4 time (whole loop): 0.176191806793 s
180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 5 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1005 1005 1005 1005 1005 1005 1005 1005 1005 1005]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 5 time since submission: 0.100549936295 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 5 submit time: 0.0752120018005 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 5 load data time: 0.0751841068268 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 5 time (whole loop): 0.101159095764 s
180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 6 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1006 1006 1006 1006 1006 1006 1006 1006 1006 1006]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 6 time since submission: 0.0641801357269 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 6 submit time: 0.0369589328766 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 6 load data time: 0.0369329452515 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 6 time (whole loop): 0.0647940635681 s
180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 7 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1007 1007 1007 1007 1007 1007 1007 1007 1007 1007]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 7 time since submission: 0.0143351554871 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 7 submit time: 0.0493400096893 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 7 load data time: 0.049252986908 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 7 time (whole loop): 0.0149519443512 s
 [==========] [100%] [] [remains: 0.000 s]180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 8 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1008 1008 1008 1008 1008 1008 1008 1008 1008 1008]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 8 time since submission: 0.0532751083374 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 8 submit time: 0.043683052063 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 8 load data time: 0.0436511039734 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 8 time (whole loop): 0.0547349452972 s
180118-11:55:38|core:2156:process_by_pixel|DEBUG> job 9 ([1000 1001 1002 1003 1004 1005 1006 1007 1008 1009], [1009 1009 1009 1009 1009 1009 1009 1009 1009 1009]) finished
180118-11:55:38|core:2158:process_by_pixel|DEBUG> job 9 time since submission: 0.00458216667175 s
180118-11:55:38|core:2160:process_by_pixel|DEBUG> job 9 submit time: 0.0491669178009 s
180118-11:55:38|core:2162:process_by_pixel|DEBUG> job 9 load data time: 0.0490930080414 s
180118-11:55:38|core:2198:process_by_pixel|DEBUG> job 9 time (whole loop): 0.00517511367798 s
 [==========] [100%] [completed in 0.604 s]                                   
180118-11:55:38|parallel:87:close_pp_server|DEBUG> ncpus: 8,  njobs: 10, rworker: None, time: 0.103153705597
180118-11:56:15|io:230:write_fits|INFO> Data written as out.fits in 0.17 s

This seems to complete ok and out.fits looks sensible.

thomasorb commented 6 years ago

Dear @Lyalpha, Your output is quite the same as the output of @simon-prunet. What's the result of uname -a ?

@thomasorb Linux computer 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:14:41 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux @simon-prunet Linux iolani 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 GNU/Linux

From https://kernelnewbies.org/LinuxVersions: Kernel 3.16 was released in August 2014 while Kernel 4.13 was released in September 2017.

I am beginning to suspect it is related to something low level, like the kernel. Because it works on Mac OS, Ubuntu 16.04, Ubuntu 17.10 etc.

Lyalpha commented 6 years ago

Ah yeah, sorry missed that , Here it is: Linux astro 4.14.11-200.fc26.x86_64 #1 SMP Wed Jan 3 13:58:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

thomasorb commented 6 years ago

It is not the kernel then ;) what's your distribution ?

thomasorb commented 6 years ago

I've made a new commit on next branch (https://github.com/thomasorb/orcs/commit/038da6d32a6695111b249a6962f2d791d2db6702) just for checking purpose. @Lyalpha and @simon-prunet, could you please tell me if the fit_lines_in_region process now runs correctly.

Lyalpha commented 6 years ago

Fedora 26. The latest commit causes it to hang also, but I do not see a 100% cpu task in top and I can no longer KeyboardInterrupt the python shell, and I have to kill it.

I have also just tested this on SUSE Enterprise 11: Linux antares 3.0.101-108.21 #1 SMP Fri Dec 29 10:25:37 UTC 2017 (5f5299b) x86_64 x86_64 x86_64 GNU/Linux with the same outcome.

thomasorb commented 6 years ago

I've just pushed a new testing commit. Can you tell me if fit_lines_in_region finishes now.

Lyalpha commented 6 years ago

I get the same all-nan output files whether I use timeout or not.

180118-13:38:19|core:371:listen|DEBUG> logging listener started
180118-13:38:19|core:4342:__init__|INFO> Data shape : (2048, 2064, 320)
180118-13:38:19|core:1569:reset_params|INFO> Cube is in WAVENUMBER (cm-1)
180118-13:38:19|core:1576:reset_params|INFO> Cube is CALIBRATED in wavenumber
 loading region: Shape : box ( Number(1369.6762),Number(1593.6944),Number(6),Number(6),Number(0) )

180118-13:38:20|fit:2637:_prepare_input_params|DEBUG> theta_orig 15.4218567014, theta_proj: 15.4218567014
180118-13:38:20|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter pos_cov: [100]
180118-13:38:20|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter sigma_def: ['1']
180118-13:38:20|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter sigma_cov: [1]
180118-13:38:20|fit:2020:_check_lines_params|DEBUG> changed single-valued parameter pos_def: ['1']
180118-13:38:20|fit:2264:add_lines_model|DEBUG> key removed: theta_proj
180118-13:38:20|fit:2264:add_lines_model|DEBUG> key removed: filter_file_path
180118-13:38:20|fit:2264:add_lines_model|DEBUG> key removed: axis_corr_orig
180118-13:38:20|fit:2264:add_lines_model|DEBUG> key removed: zpd_index
180118-13:38:20|fit:2264:add_lines_model|DEBUG> key removed: signal_range
180118-13:38:20|fit:2264:add_lines_model|DEBUG> key removed: theta_orig
180118-13:38:20|fit:2264:add_lines_model|DEBUG> key removed: axis_corr_proj
180118-13:38:20|fit:2264:add_lines_model|DEBUG> key removed: apodization
180118-13:38:20|core:563:_fit_lines_in_region|INFO> Number of spectra to fit: 36
180118-13:38:21|core:1775:__init__|DEBUG> debug set to True
180118-13:38:21|parallel:74:init_pp_server|INFO> Init of the parallel processing server with 8 threads
180118-13:38:21|core:2032:process_by_pixel|INFO> passed mapped kwargs : []
180118-13:38:21|core:2064:process_by_pixel|INFO> 6 rows to fit
 [==========] [100%] [] [remains: 0.000 s]180118-13:38:22|core:2157:process_by_pixel|DEBUG> job 0 ([1366 1367 1368 1369 1370 1371], [1590 1590 1590 1590 1590 1590]) finished
180118-13:38:22|core:2159:process_by_pixel|DEBUG> job 0 time since submission: 0.791958808899 s
180118-13:38:22|core:2161:process_by_pixel|DEBUG> job 0 submit time: 0.0580251216888 s
180118-13:38:22|core:2163:process_by_pixel|DEBUG> job 0 load data time: 0.0579569339752 s
/home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
180118-13:38:23|core:2199:process_by_pixel|DEBUG> job 0 time (whole loop): 1.37171983719 s
180118-13:38:23|core:2157:process_by_pixel|DEBUG> job 1 ([1366 1367 1368 1369 1370 1371], [1591 1591 1591 1591 1591 1591]) finished
180118-13:38:23|core:2159:process_by_pixel|DEBUG> job 1 time since submission: 1.23418617249 s
180118-13:38:23|core:2161:process_by_pixel|DEBUG> job 1 submit time: 0.136742115021 s
180118-13:38:23|core:2163:process_by_pixel|DEBUG> job 1 load data time: 0.136663198471 s
/home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
180118-13:38:23|core:2199:process_by_pixel|DEBUG> job 1 time (whole loop): 1.28541302681 s
180118-13:38:23|core:2157:process_by_pixel|DEBUG> job 2 ([1366 1367 1368 1369 1370 1371], [1592 1592 1592 1592 1592 1592]) finished
180118-13:38:23|core:2159:process_by_pixel|DEBUG> job 2 time since submission: 1.1482489109 s
180118-13:38:23|core:2161:process_by_pixel|DEBUG> job 2 submit time: 0.136356830597 s
180118-13:38:23|core:2163:process_by_pixel|DEBUG> job 2 load data time: 0.136263847351 s
/home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
180118-13:38:23|core:2199:process_by_pixel|DEBUG> job 2 time (whole loop): 1.26287198067 s
180118-13:38:23|core:2157:process_by_pixel|DEBUG> job 3 ([1366 1367 1368 1369 1370 1371], [1593 1593 1593 1593 1593 1593]) finished
180118-13:38:23|core:2159:process_by_pixel|DEBUG> job 3 time since submission: 1.09845614433 s
180118-13:38:23|core:2161:process_by_pixel|DEBUG> job 3 submit time: 0.163614988327 s
180118-13:38:23|core:2163:process_by_pixel|DEBUG> job 3 load data time: 0.163536071777 s
/home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
180118-13:38:23|core:2199:process_by_pixel|DEBUG> job 3 time (whole loop): 1.31992197037 s
180118-13:38:23|core:2157:process_by_pixel|DEBUG> job 4 ([1366 1367 1368 1369 1370 1371], [1594 1594 1594 1594 1594 1594]) finished
180118-13:38:23|core:2159:process_by_pixel|DEBUG> job 4 time since submission: 1.16436600685 s
180118-13:38:23|core:2161:process_by_pixel|DEBUG> job 4 submit time: 0.154395103455 s
180118-13:38:23|core:2163:process_by_pixel|DEBUG> job 4 load data time: 0.154281139374 s
/home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
180118-13:38:23|core:2199:process_by_pixel|DEBUG> job 4 time (whole loop): 1.18103814125 s
180118-13:38:23|core:2157:process_by_pixel|DEBUG> job 5 ([1366 1367 1368 1369 1370 1371], [1595 1595 1595 1595 1595 1595]) finished
180118-13:38:23|core:2159:process_by_pixel|DEBUG> job 5 time since submission: 0.986745119095 s
180118-13:38:23|core:2161:process_by_pixel|DEBUG> job 5 submit time: 0.193012952805 s
180118-13:38:23|core:2163:process_by_pixel|DEBUG> job 5 load data time: 0.19290804863 s
/home/jdl/anaconda2/envs/astroconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
180118-13:38:24|core:2199:process_by_pixel|DEBUG> job 5 time (whole loop): 1.17822813988 s
 [==========] [100%] [completed in 2.03 s]                                    
180118-13:38:24|parallel:87:close_pp_server|DEBUG> ncpus: 8,  njobs: 6, rworker: None, time: 7.59756493568
180118-13:38:24|core:2517:write_maps|INFO> Writing height maps
180118-13:38:24|core:2539:write_maps|WARNING> param height is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.height.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing height-err maps
180118-13:38:24|core:2539:write_maps|WARNING> param height-err is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.height-err.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing amplitude maps
180118-13:38:24|core:2539:write_maps|WARNING> param amplitude is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.amplitude.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing amplitude-err maps
180118-13:38:24|core:2539:write_maps|WARNING> param amplitude-err is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.amplitude-err.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing velocity maps
180118-13:38:24|core:2539:write_maps|WARNING> param velocity is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.velocity.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing velocity-err maps
180118-13:38:24|core:2539:write_maps|WARNING> param velocity-err is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.velocity-err.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing fwhm maps
180118-13:38:24|core:2539:write_maps|WARNING> param fwhm is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.fwhm.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing fwhm-err maps
180118-13:38:24|core:2539:write_maps|WARNING> param fwhm-err is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.fwhm-err.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing sigma maps
180118-13:38:24|core:2539:write_maps|WARNING> param sigma is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.sigma.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing sigma-err maps
180118-13:38:24|core:2539:write_maps|WARNING> param sigma-err is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.sigma-err.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing flux maps
180118-13:38:24|core:2539:write_maps|WARNING> param flux is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.flux.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing flux-err maps
180118-13:38:24|core:2539:write_maps|WARNING> param flux-err is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.flux-err.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing logGBF maps
180118-13:38:24|core:2539:write_maps|WARNING> param logGBF is the same for all lines
180118-13:38:24|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.logGBF.fits in 0.02 s 
180118-13:38:24|core:2517:write_maps|INFO> Writing chi2 maps
180118-13:38:24|core:2539:write_maps|WARNING> param chi2 is the same for all lines
180118-13:38:25|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.chi2.fits in 0.02 s 
180118-13:38:25|core:2517:write_maps|INFO> Writing rchi2 maps
180118-13:38:25|core:2539:write_maps|WARNING> param rchi2 is the same for all lines
180118-13:38:25|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.rchi2.fits in 0.02 s 
180118-13:38:25|core:2517:write_maps|INFO> Writing ks_pvalue maps
180118-13:38:25|core:2539:write_maps|WARNING> param ks_pvalue is the same for all lines
180118-13:38:25|io:230:write_fits|INFO> Data written as NGC6946_SN3.1.0.ORCS/MAPS/NGC6946_SN3.1.0.LineMaps.map.all.1x1.ks_pvalue.fits in 0.02 s 
180118-13:38:25|parallel:87:close_pp_server|DEBUG> ncpus: 8,  njobs: 6, rworker: None, time: 7.59756493568
thomasorb commented 6 years ago

Okay, the all-nans outputs are normal since I've only deactivated the fitting function, but the whole parallel server still works. I will check with a fake fitting functions then and see if it comes from the pickel / unpickle process on the fitting function.

thomasorb commented 6 years ago

New commit https://github.com/thomasorb/orcs/commit/968daf645c547ff17196c672b3426eb036e90b0f ;) please test it.

I've replaced the function called by the server with a void function... let's see what it returns

thomasorb commented 6 years ago

I've pushed the testing version on branch issue4: https://github.com/thomasorb/orcs/tree/issue4. Please pull it and checkout on this new branch so that I don't use the next branch for testing purpose. The next branch have been reverted to its previous state.

Lyalpha commented 6 years ago

That completed also and gave me the output files. (Note I had to switch your return None to return {} since it gave an error that it was expecting a dict.)

thomasorb commented 6 years ago

Good ! Thanks for the debug !

New commit : https://github.com/thomasorb/orcs/commit/12bab9b123aa8f7a92a37ec965cb343cc3cfe171. I've just added the parameters to the void function. It's not much but we'll see step by step where the problem come from.

@Lyalpha Thank you for your help in debugging this issue ;)

Lyalpha commented 6 years ago

Yep, that went through ok too.

thomasorb commented 6 years ago

New commit https://github.com/thomasorb/orcs/commit/94b658575994185aa18543d4365669003863414b: digging into this function wrapping inferno ;)

Lyalpha commented 6 years ago

It completed but took significantly longer:

 [==========] [100%] [completed in 31.3 s]                                    
180118-15:59:17|parallel:87:close_pp_server|DEBUG> ncpus: 8,  njobs: 6, rworker: None, time: 183.095376015
thomasorb commented 6 years ago

Perfect, it's now doing more and more stuff so, I guess it's what we want :smile: new commit https://github.com/thomasorb/orcs/commit/3817a0de5c7a7e65385fb0704ebcd2c73dd03ef1, @Lyalpha please have a look.

Lyalpha commented 6 years ago

Also completed, similar time.

thomasorb commented 6 years ago

Good, new commit https://github.com/thomasorb/orcs/commit/8a7b6a7ae2f1dcf95a7fbd62040f7c91b0543971, going deeper :smile: Please test it

Lyalpha commented 6 years ago

Completes too..

thomasorb commented 6 years ago

I made a new commit on ORB https://github.com/thomasorb/orb/commit/cbf8eafc11da1d4dc81c1c60c581b8fd880f7df5 (branch issue4 also) and ORCS (still the same branch)

Please update both and test it :smile:

Lyalpha commented 6 years ago

Right - this one seems to halt here:

180118-18:06:30|core:1776:__init__|DEBUG> debug set to True
180118-18:06:30|parallel:74:init_pp_server|INFO> Init of the parallel processing server with 8 threads
180118-18:06:31|core:2032:process_by_pixel|INFO> passed mapped kwargs : []
180118-18:06:31|core:2064:process_by_pixel|INFO> 6 rows to fit
 [==========] [100%] [] [remains: 0.000 s]180118-18:06:32|core:2157:process_by_pixel|DEBUG> job 0 ([1366 1367 1368 1369 1370 1371], [1590 1590 1590 1590 1590 1590]) finished
180118-18:06:32|core:2159:process_by_pixel|DEBUG> job 0 time since submission: 0.953728914261 s
180118-18:06:32|core:2161:process_by_pixel|DEBUG> job 0 submit time: 0.0614860057831 s
180118-18:06:32|core:2163:process_by_pixel|DEBUG> job 0 load data time: 0.0614199638367 s
thomasorb commented 6 years ago

this time i've only made a commit on ORB https://github.com/thomasorb/orb/commit/685047622de7453cfbf233109ae30940e1ab0618 adding more debug lines

Lyalpha commented 6 years ago

Hmmm, I don't even see the 'start func' debug entry on my output before it halts...

thomasorb commented 6 years ago

does it change something if you add

return []

here in _fit_lines_in_spectrum, line 2564 ??

   if isinstance(ip, InputParams):
        rawip = ip.convert()
   else: rawip = ip
   return [] ### here
simon-prunet commented 6 years ago

Hi there ! I don't know if that can help, but I got things to work outside the pp job server, but I had (still) to change the viewkeys() method to keys() for the BufferDict type in orb/fit.py:

diff --git a/orb/fit.py b/orb/fit.py
old mode 100755
new mode 100644
index 4f67d30..c9807d7
--- a/orb/fit.py
+++ b/orb/fit.py
@@ -790,7 +790,7 @@ class Model(object):

         :param p_val: New full set of parameters.
         """
-        if p_val.viewkeys() == self.p_val.viewkeys():
+        if p_val.keys() == self.p_val.keys():
             self.p_val = copy.copy(p_val)
             self.val2free()
         else: raise Exception('bad format of passed val parameters')
thomasorb commented 6 years ago

Hi @simon-prunet, Well, this viewkeys issue is strange also. Does it come from gvar or python ? What is the exact version of Python you are using ?

$ python --version
Python 2.7.14

Also, what is the returned error with

if p_val.viewkeys() == self.p_val.viewkeys():

And, in fact,

I got things to work outside the pp job server

what do you mean by this ? :smiley:

simon-prunet commented 6 years ago

Hello @thomasorb A rather old version of python 2.7: prunet@iolani:~$ python --version Python 2.7.9

I could try an update and see if the problem remains ? At least with yesterday's updates, we know gvar satisfies the requirement (>=8.3.3)

simon-prunet commented 6 years ago

@thomasorb ,

here is a small example showing the pb with testing equality between a.viewkeys() and b.viewkeys() when a is a BufferDict and b is a dict:

In [2]: from gvar import BufferDict

In [3]: a=BufferDict()

In [4]: a['a']=1

In [5]: a['b']=2

In [6]: a.viewkeys()
Out[6]: KeysView(BufferDict([('a', 1), ('b', 2)]))

In [7]: a.keys()
Out[7]: ['a', 'b']

In [8]: b=dict()

In [9]: b['a']=1

In [10]: b['b']=2

In [11]: b.viewkeys()
Out[11]: dict_keys(['a', 'b'])

In [12]: b.keys()
Out[12]: ['a', 'b']

In [13]: a.viewkeys() == b.viewkeys()
Out[13]: False

On my mac with python 2.7.14, this last test works !

Lyalpha commented 6 years ago

I have thrown lots of debugging lines into my orcs.core. The line it halts on is

res_row = ijob()

printing the __str__ and type of ijob gives me:

<pp._Task object at 0x7fcdd5a29790>
<class 'pp._Task'>

It seems like something is causing it to halt when the object is called.

thomasorb commented 6 years ago

Dear @Lyalpha What's the version of python you are using ? It maybe the source of the problem... I'm using 2.7.14

simon-prunet commented 6 years ago

@thomasorb hello ! machine (iolani) now has python 2.7.13, but unfortunately it is still hanging when going through the job server. However, I believe I think which line is responsible for hanging:

orb/fit.py, line 164 (in the init of class FitVector):

            self.models[-1].make_guess(self.vector)

When I comment it out it processes to the end (having commented out the call to the fit itself). When I uncomment this line, it hangs...

Lyalpha commented 6 years ago

2.7.13 here

Lyalpha commented 6 years ago

I confirm the finding of @simon-prunet - if I comment out L164 in orb/fit.py the processes finish (with all nan output), with the line included it halts.

thomasorb commented 6 years ago

Dear @Lyalpha and @simon-prunet,

I may have solved this issue (at least it works on iolani) with the last next branch commit of orb https://github.com/thomasorb/orb/commit/4a9cc4a18c442d4d63399f96d396492918c02304 and orcs https://github.com/thomasorb/orcs/commit/fbe212c21004f8f9a800c1518cf66e4956d5c2cc. Please update both programs and try it :smile:

All was coming from a pickling issue mainly with arrays containing gvar objects (https://pypi.python.org/pypi/gvar)

Hope this helps :smiley: