tikk3r / flocs

Containers recipes for software stacks used in LOFAR data reduction.
https://tikk3r.github.io/flocs/
GNU General Public License v3.0
6 stars 6 forks source link

extract.py crashes #58

Closed AlexKurek closed 1 year ago

AlexKurek commented 1 year ago

Using lofar_sksp_v4.1.0_x86-64_generic_ddf_cuda.sif Im getting:

Running command rclone --multi-thread-streams 1 --config=/storage/akurek/extractPy/maca_sksp_tape_DDF_readonly.conf listremotes
Traceback (most recent call last):
  File "/opt/lofar/ddf-pipeline/scripts/extraction.py", line 107, in <module>
    prepare_field(field,fdir,verbose=True)
  File "/opt/lofar/ddf-pipeline/utils/reprocessing_utils.py", line 155, in prepare_field
    do_sdr_and_rclone_download(field,processingdir,verbose=verbose,Mode=Mode,operations=operations)
  File "/opt/lofar/ddf-pipeline/utils/reprocessing_utils.py", line 81, in do_sdr_and_rclone_download
    do_rclone_download(cname,f,verbose=verbose,Mode=Mode,operations=operations)
  File "/opt/lofar/ddf-pipeline/utils/reprocessing_utils.py", line 94, in do_rclone_download
    rc.get_remote()
  File "/opt/lofar/ddf-pipeline/utils/rclone.py", line 116, in get_remote
    d=self.execute('listremotes')
  File "/opt/lofar/ddf-pipeline/utils/rclone.py", line 58, in execute
    proc=subprocess.Popen(fullcommand,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
  File "/usr/lib64/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib64/python3.10/subprocess.py", line 1847, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'rclone'

It seems easy to fix by adding rclone package to the container

tikk3r commented 1 year ago

Thanks for reporting. I've added rclone to the recipe for the next release.

AlexKurek commented 1 year ago

Later there is another crash, but I dont know why it happens:

Symlink ./SOLSDIR/L658346_SB001_uv_avg_12C2BC993t_129MHz.pre-cal.ms/killMS.DDS3_full_smoothed.sols.npz already exists, recreating
Successful readonly open of default-locked table L658346_SB001_uv_avg_12C2BC993t_121MHz.pre-cal.ms/OBSERVATION: 31 columns, 1 rows
../4C29.30.ds9.reg
[130.00975000deg,29.81742500deg]
Correcting boxfile for the local north
Using these observations  ['L658346']
Traceback (most recent call last):
  File "/opt/lofar/ddf-pipeline/scripts/sub-sources-outside-region.py", line 585, in <module>
    DOut=SummaryToVersion("summary.txt")
  File "/opt/lofar/ddf-pipeline/scripts/sub-sources-outside-region.py", line 574, in SummaryToVersion
    l=L[iLine]
IndexError: list index out of range
 - 21:09:21 - ClearSHM                     | Clear shared memory
 - 21:09:21 - Multiprocessing              | reaping 70 shared memory objects associated with 70 dead DDFacet processes
 - 21:09:21 - ClearSHM                     | Clear Semaphores
 - 21:09:21 - ClearSHM                     | Clear shared dictionaries
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB404_uv.pre-cal_12D524E44t_154MHz.pre-cal.ms 0.2452661179385455
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB413_uv.pre-cal_12D524E44t_156MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB413_uv.pre-cal_12D524E44t_156MHz.pre-cal.ms 0.26963945066386985
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB423_uv.pre-cal_12D524E44t_158MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB423_uv.pre-cal_12D524E44t_158MHz.pre-cal.ms 0.2744160681578087
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB432_uv.pre-cal_12D524E44t_160MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB432_uv.pre-cal_12D524E44t_160MHz.pre-cal.ms 0.17283014508614727
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB442_uv.pre-cal_12D524E44t_162MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB442_uv.pre-cal_12D524E44t_162MHz.pre-cal.ms 0.19027707730074406
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB452_uv.pre-cal_12D524E44t_164MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB452_uv.pre-cal_12D524E44t_164MHz.pre-cal.ms 0.24146511168365292
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB461_uv.pre-cal_12D524E44t_166MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB461_uv.pre-cal_12D524E44t_166MHz.pre-cal.ms 0.5271044103452731

============================= Running subtraction  =============================

Running: sub-sources-outside-region.py  --timeavg=1 --overwriteoutput --ncpu=28 -b ../4C29.30.ds9.reg -p 4C29.30
FAILED to run sub-sources-outside-region.py  --timeavg=1 --overwriteoutput --ncpu=28 -b ../4C29.30.ds9.reg -p 4C29.30: return value is 1
Traceback (most recent call last):
  File "/opt/lofar/ddf-pipeline/scripts/extraction.py", line 116, in <module>
    run(executionstr,database=False)
  File "/opt/lofar/ddf-pipeline/utils/auxcodes.py", line 68, in run
    die('FAILED to run '+s+': return value is '+str(retval),database=database)
  File "/opt/lofar/ddf-pipeline/utils/auxcodes.py", line 51, in die
    raise RuntimeError(s)
RuntimeError: FAILED to run sub-sources-outside-region.py  --timeavg=1 --overwriteoutput --ncpu=28 -b ../4C29.30.ds9.reg -p 4C29.30: return value is 1
tikk3r commented 1 year ago

rclone has been added in the latest release.