Closed duyhoang-astro closed 4 years ago
Hey Duy, id have guessed it is probably related to the small machine. Do you have a much larger one with say 200GB of ram?
Indeed. The pipeline can't run (with normal settings on normal data) on a machine anything like this small. The broken pipe is probably from workers being killed by the OOM killer in the kernel.
Hey Tim, Martin. Thanks for the replies. Yes, we have more powerful nodes (128GB and 256 GB of RAM). I am just testing if the pipeline works on a small machine to expand the computing capacity. Can the pipeline distribute its tasks to multiple nodes? We have a number of small nodes here. It would be nice if it works on these small nodes.
No, it can't distribute the tasks. DDFacet and kMS are the limiting factors here. There was some interest in distributed DDFacet a while back but it didn't come to anything. We have tested pretty extensively ourselves and 192 GB is the minimum RAM for a full pipeline run with the standard configuration.
Thanks Martin for the info. Cheers.
Hi,
I am running ddf-pipeline and having a broken pipe error. Anyone has an idea on the error and how to fix it? I am running the pipeline from a Singularity image on a small machine (6 cores, 32 GB of RAM, 60 GB of /tmp). I know the machine is not as powerful as the suggested requirements for the pipeline, but not sure if this is related to the error here.
Traceback (most recent call last): File "/usr/lib64/python2.7/multiprocessing/queues.py", line 266, in _feed send(obj) IOError: [Errno 32] Broken pipe ESC[91mFAILED to run DDF.py --Output-Name=image_full_wide --Data-MS=mslist.txt --Deconv-PeakFactor 0.001000 --Data-ColName CORRECTED_DATA --Parallel-NCPU=6 --Beam-CenterNorm=1 --Deconv-CycleFactor=0 --Deconv-MaxMinorIter=1000000 --Deconv-MaxMajorIter=2 --Deconv-Mode SSD --Beam-Model=LOFAR --Beam-LOFARBeamMode=A --Weight-Robust -0.200000 --Image-NPix=10000 --CF-wmax 50000 --CF-Nw 100 --Output-Also onNeds --Image-Cell 10.000000 --Facets-NFacets=11 --SSDClean-NEnlargeData 0 --Freq-NDegridBand 1 --Beam-NBand 1 --Facets-DiamMax 1.5 --Facets-DiamMin 0.1 --Deconv-RMSFactor=3.000000 --SSDClean-ConvFFTSwitch 10000 --Data-Sort 1 --Cache-Dir=. --Log-Memory 1 --GAClean-RMSFactorInitHMP 1.000000 --GAClean-MaxMinorIterInitHMP 10000.000000 --GAClean-AllowNegativeInitHMP True --DDESolutions-SolsDir=SOLSDIR --Cache-Weight=reset --Output-Mode=Clean --Output-RestoringBeam 45.000000 --Weight-ColName="None" --Freq-NBand=2 --RIME-DecorrMode=FT --SSDClean-SSDSolvePars [S,Alpha] --SSDClean-BICFactor 0 --Mask-Auto=1 --Mask-SigTh=15.00 --Selection-UVRangeKm=[0.100000,11.444444] --GAClean-MinSizeInit=10 --Beam-Smooth=1: return value is 1ESC[0m Traceback (most recent call last): File "/opt/lofar/ddf-pipeline/scripts/pipeline.py", line 1819, in
main(o)
File "/opt/lofar/ddf-pipeline/scripts/pipeline.py", line 1001, in main
subtractOuterSquare(o)
File "/opt/lofar/ddf-pipeline/scripts/pipeline.py", line 828, in subtractOuterSquare
catcher=catcher)
File "/opt/lofar/ddf-pipeline/scripts/pipeline.py", line 286, in ddf_image
run(runcommand,dryrun=options['dryrun'],log=logfilename('DDF-'+imagename+'.log',options=options),quiet=options['quiet'])
File "/opt/lofar/ddf-pipeline/utils/auxcodes.py", line 54, in run
die('FAILED to run '+s+': return value is '+str(retval))
File "/opt/lofar/ddf-pipeline/utils/auxcodes.py", line 36, in die
raise Exception(s)
Exception: FAILED to run DDF.py --Output-Name=image_full_wide --Data-MS=mslist.txt --Deconv-PeakFactor 0.001000 --Data-ColName CORRECTED_DATA --Parallel-NCPU=6 --Beam-CenterNorm=1 --Deconv-CycleFactor=0 --Deconv-MaxMinorIter=1000000 --Deconv-MaxMajorIter=2 --Deconv-Mode SSD --Beam-Model=LOFAR --Beam-LOFARBeamMode=A --Weight-Robust -0.200000 --Image-NPix=10000 --CF-wmax 50000 --CF-Nw 100 --Output-Also onNeds --Image-Cell 10.000000 --Facets-NFacets=11 --SSDClean-NEnlargeData 0 --Freq-NDegridBand 1 --Beam-NBand 1 --Facets-DiamMax 1.5 --Facets-DiamMin 0.1 --Deconv-RMSFactor=3.000000 --SSDClean-ConvFFTSwitch 10000 --Data-Sort 1 --Cache-Dir=. --Log-Memory 1 --GAClean-RMSFactorInitHMP 1.000000 --GAClean-MaxMinorIterInitHMP 10000.000000 --GAClean-AllowNegativeInitHMP True --DDESolutions-SolsDir=SOLSDIR --Cache-Weight=reset --Output-Mode=Clean --Output-RestoringBeam 45.000000 --Weight-ColName="None" --Freq-NBand=2 --RIME-DecorrMode=FT --SSDClean-SSDSolvePars [S,Alpha] --SSDClean-BICFactor 0 --Mask-Auto=1 --Mask-SigTh=15.00 --Selection-UVRangeKm=[0.100000,11.444444] --GAClean-MinSizeInit=10 --Beam-Smooth=1: return value is 1
ESC[34mINFO: ESC[0m Cleaning up image...