mhardcastle / ddf-pipeline

LOFAR pipeline using killms/ddfacet
GNU General Public License v2.0
23 stars 20 forks source link

Singularity contianer compilation crashes #312

Closed AlexKurek closed 1 year ago

AlexKurek commented 1 year ago

Just reporting - Im getting:

Scanning dependencies of target schaapcommon
[  7%] Building CXX object cpp/hamaker/CMakeFiles/hamaker.dir/hamakerelementresponse.cc.o
[  7%] Building CXX object cpp/hamaker/CMakeFiles/hamaker.dir/hamakercoeff.cc.o
[  7%] Building CXX object external/schaapcommon/CMakeFiles/schaapcommon.dir/src/facets/facetimage.cc.o
[  7%] Building CXX object external/schaapcommon/CMakeFiles/schaapcommon.dir/src/h5parm/h5parm.cc.o
[ 10%] Building CXX object cpp/oskar/CMakeFiles/oskar.dir/oskardatafile.cc.o
[ 10%] Building CXX object external/schaapcommon/CMakeFiles/schaapcommon.dir/src/h5parm/soltab.cc.o
[ 10%] Building CXX object cpp/oskar/CMakeFiles/oskar.dir/oskarelementresponse.cc.o
[ 12%] Building CXX object cpp/oskar/CMakeFiles/oskar.dir/oskardataset.cc.o
[ 13%] Building CXX object external/schaapcommon/CMakeFiles/schaapcommon.dir/src/h5parm/gridinterpolate.cc.o
[ 16%] Building CXX object cpp/oskar/CMakeFiles/oskar.dir/oskar_evaluate_spherical_wave_sum.cc.o
[ 16%] Building CXX object cpp/oskar/CMakeFiles/oskar.dir/oskar_evaluate_dipole_pattern.cc.o
[ 18%] Linking CXX shared library libeverybeam-hamaker.so
[ 20%] Linking CXX shared library libeverybeam-oskar.so
[ 20%] Built target hamaker
[ 20%] Built target oskar
[ 21%] Linking CXX static library libschaapcommon.a
[ 21%] Built target schaapcommon
make[2]: *** [cpp/lobes/CMakeFiles/download_lobes_coefficients.dir/build.make:76: cpp/lobes/CMakeFiles/download_lobes_coefficients] Error 8
make[1]: *** [CMakeFiles/Makefile2:558: cpp/lobes/CMakeFiles/download_lobes_coefficients.dir/all] Error 2
make: *** [Makefile:149: all] Error 2
FATAL:   While performing build: while running engine: exit status 2
mhardcastle commented 1 year ago

Thanks. Clearly an upstream issue with the Everybeam library which is pulling in other packages... we are building against a very old version of this so we would need to try to move to a new version, possibly then updating other components of the LOFAR software stack to match.

mhardcastle commented 1 year ago

So I have built successfully with the following changes:

diff --git a/ddf-py3.singularity b/ddf-py3.singularity
index b7ba3bc..116c5d8 100644
--- a/ddf-py3.singularity
+++ b/ddf-py3.singularity
@@ -97,7 +97,7 @@ From: debian:bullseye

    # IDG -- for wsclean and DP3
    cd $SRC
-   git clone -b 0.8 https://gitlab.com/astron-idg/idg.git
+   git clone -b 1.1.0 https://gitlab.com/astron-idg/idg.git
    cd idg && mkdir build && cd build
    cmake -DCMAKE_INSTALL_PREFIX=/usr/local/idg/ ..
    make -j $J
@@ -115,7 +115,7 @@ From: debian:bullseye

    # Everybeam -- for DP3
    cd $SRC
-   git clone -b v0.1.3 https://git.astron.nl/RD/EveryBeam.git
+   git clone -b v0.4.0 https://git.astron.nl/RD/EveryBeam.git
    cd EveryBeam
    mkdir build
    cd build
@@ -125,7 +125,7 @@ From: debian:bullseye

    # DP3
    cd $SRC
-   git clone -b v5.1 https://github.com/lofar-astron/DP3.git
+   git clone https://github.com/lofar-astron/DP3.git
    cd DP3
    mkdir build
    cd build
@@ -146,7 +146,7 @@ From: debian:bullseye

   # wsclean latest -- for selfcal
    cd $SRC
-   git clone -b v3.0 https://gitlab.com/aroffringa/wsclean.git
+   git clone -b v3.3 https://gitlab.com/aroffringa/wsclean.git
    cd wsclean
    mkdir -p build
    cd build
@@ -202,7 +202,7 @@ From: debian:bullseye
   rm -rf /var/lib/apt/lists/*

   bash -c "rm -rf /usr/local/src/{DP3,EveryBeam,LOFARBeam,aoflagger,dysco,idg,wsclean,PyBDSF,SpiderScripts}/" # DDFacet,killMS
-  ln -s /usr/local/bin/DPPP /usr/local/bin/DP3
+  #ln -s /usr/local/bin/DPPP /usr/local/bin/DP3

However this needs testing against the selfcal scripts. If you were able to do this and report back, Alex, I could push these changes.

AlexKurek commented 1 year ago

I use even more recent commits: https://github.com/tikk3r/flocs/issues/58#issuecomment-1522808276 and getting this issue again: https://github.com/mhardcastle/ddf-pipeline/issues/290

This is how it looks now exactly:

Successful readonly open of default-locked table L658346_SB001_uv_avg_12C2BC993t_121MHz.pre-cal.ms/OBSERVATION: 31 columns, 1 rows
../4C29.30.ds9.reg
[130.00975000deg,29.81742500deg]
Correcting boxfile for the local north
Using these observations  ['L658346']
Traceback (most recent call last):
  File "/usr/local/src/ddf-pipeline/scripts/sub-sources-outside-region.py", line 585, in <module>
    DOut=SummaryToVersion("summary.txt")
  File "/usr/local/src/ddf-pipeline/scripts/sub-sources-outside-region.py", line 574, in SummaryToVersion
    l=L[iLine]
IndexError: list index out of range
Traceback (most recent call last):
  File "/usr/local/bin/CleanSHM.py", line 26, in <module>
    from DDFacet.compatibility import range
  File "/usr/local/lib/python3.9/dist-packages/DDFacet/__init__.py", line 23, in <module>
    __version__ = pkg_resources.require("DDFacet")[0].version
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 886, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 777, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (numpy 1.24.3 (/usr/local/lib/python3.9/dist-packages), Requirement.parse('numpy<1.24,>=1.18'), {'numba'})
 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB413_uv.pre-cal_12D524E44t_156MHz.pre-cal.ms 0.26963945066386985
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB423_uv.pre-cal_12D524E44t_158MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB423_uv.pre-cal_12D524E44t_158MHz.pre-cal.ms 0.2744160681578087
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB432_uv.pre-cal_12D524E44t_160MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB432_uv.pre-cal_12D524E44t_160MHz.pre-cal.ms 0.17283014508614727
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB442_uv.pre-cal_12D524E44t_162MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB442_uv.pre-cal_12D524E44t_162MHz.pre-cal.ms 0.19027707730074406
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB452_uv.pre-cal_12D524E44t_164MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB452_uv.pre-cal_12D524E44t_164MHz.pre-cal.ms 0.24146511168365292
Successful readonly open of default-locked table /storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB461_uv.pre-cal_12D524E44t_166MHz.pre-cal.ms: 25 columns, 6578850 rows
/storage/akurek/extractPy/4C29.30_timeavg1/4C29.30/P129+29/L693959_SB461_uv.pre-cal_12D524E44t_166MHz.pre-cal.ms 0.5271044103452731

============================= Running subtraction  =============================

Running: sub-sources-outside-region.py  --timeavg=1 --overwriteoutput --ncpu=28 -b ../4C29.30.ds9.reg -p 4C29.30
FAILED to run sub-sources-outside-region.py  --timeavg=1 --overwriteoutput --ncpu=28 -b ../4C29.30.ds9.reg -p 4C29.30: return value is 1
Traceback (most recent call last):
  File "/usr/local/src/ddf-pipeline/scripts/extraction.py", line 116, in <module>
    run(executionstr,database=False)
  File "/usr/local/src/ddf-pipeline/utils/auxcodes.py", line 68, in run
    die('FAILED to run '+s+': return value is '+str(retval),database=database)
  File "/usr/local/src/ddf-pipeline/utils/auxcodes.py", line 51, in die
    raise RuntimeError(s)
RuntimeError: FAILED to run sub-sources-outside-region.py  --timeavg=1 --overwriteoutput --ncpu=28 -b ../4C29.30.ds9.reg -p 4C29.30: return value is 1

Also in this patched container there is numpy 1.24.3 which is problematic: https://github.com/lofar-astron/PyBDSF/issues/202

AlexKurek commented 1 year ago

I have added print debug statements to sub-sources-outside-region.py:

def SummaryToVersion(summaryFile):        
  f=open(summaryFile,"r")
  ll=f.readlines()
  L=[l.strip() for l in ll]
  DFields={"StrDate":'ddf-pipeline completed at ',
           "v_ddfPipe":'ddf-pipeline version was ',
           "v_DDF":'DDF version was ',
           "v_kMS":'killMS version was ',
           "v_DynSpec":'DynSpecMS version was '}

  DOut={}
  for npField in DFields.keys():
    field=DFields[npField]
    iLine=0
    while True:
      print(f"L: {L}")
      size_of_L = len(L)
      print(f"size of L: {size_of_L}")
      print(f"iLine: {iLine}")
      print("=========================================================")
      l=L[iLine]
      if l.startswith(field):
        d=l.split(field)[-1]

The log is attached (too large to paste here) logExtract_timeavg1.zip

In the last iteration:

size of L: 112
iLine: 112
mhardcastle commented 1 year ago

There seem to be three? separate issues here. Please raise a ticket for separate issues separately to help us to keep track.

I have updated the singularity to fix the compilation problems, which are unfortunately now a moving target, since dependencies require us to use development versions of DP3 and wsclean. This is in a 'bugfixes' branch with a number of other changes. When I have run tests of the pipeline and extraction pipeline in the singularity I'll update or close this issue.

AlexKurek commented 1 year ago

I think this is one issue for sure IndexError, now moved here: https://github.com/mhardcastle/ddf-pipeline/issues/320. Maybe two if latest Numpy still crashes after fixing the first issue.

mhardcastle commented 1 year ago

I'm closing this issue since it was fixed -- now broken again because of further wsclean/DP3 issues but that's covered in #330.