lofar-astron / prefactor

Pre facet calibration pipeline
http://www.astron.nl/citt/prefactor
GNU General Public License v3.0
29 stars 28 forks source link

Broken Prefactor results/ ? #292

Closed AlexKurek closed 2 years ago

AlexKurek commented 3 years ago

Im getting the following error while trying to run DDF pipeline of Prefactor's results/ for fields L258201 and L343256:

RuntimeError: SSMIndex::getIndex - access to non-existing row 60 in column NAME of table [...]

reported here: https://github.com/mhardcastle/ddf-pipeline/issues/252

@mhardcastle wrote a small script:

from __future__ import print_function
import pyrap.tables as pt
import numpy as np
import os
import glob

def fix_antenna(ms):

    for tname in ['ANTENNA','LOFAR_ANTENNA_FIELD']:
        t1=pt.table(ms+'/'+tname)

        # work out how many valid rows there are by trying to read each one and catch the exception
        valid=[]
        for i in range(len(t1)):
            try:
                _=t1[i]['NAME']
                valid.append(True)
            except RuntimeError:
                valid.append(False)

        if np.sum(valid)==len(t1):
            print('Everything is OK! Skip correction step')
            t1.close()
        else:
            print('There are',np.sum(valid),'valid rows present in corrupt table, generating new table')

            t1.copy(ms+'/TEMP',copynorows=True)
            t2=pt.table(ms+'/TEMP',readonly=False)
            t2.addrows(np.sum(valid))
            j=0
            for i in range(len(t1)):
                if valid[i]:
                    t2[j]=t1[i]
                    j+=1
            t2.close()
            t1.close()
            os.rename(ms+'/'+tname,ms+'/OLD_'+tname)
            os.rename(ms+'/TEMP',ms+'/'+tname)

if __name__=='__main__':
    g=glob.glob('*.ms')
    for ms in g:
        print('Doing',ms)
        fix_antenna(ms)

that is able to fix the MSs (Prefactor's results/ folder) and Im able to run DDF now. I attach zip-ed logs and summary files from Prefactor for both fields. L343256.zip L258201.zip

AlexKurek commented 3 years ago

I uploaded both those MS sets here: lofar.herts.ac.uk:/beegfs/lofar/alexkurek

adrabent commented 3 years ago

I am not aware of such kind of an issue. Writing corrupted MS is usually related to an issue with DP3, not directly with the pipeline. Can you constrain at which step of prefactor the data might have gotten corrupted?

AlexKurek commented 3 years ago

Im using DP3 4.2 since this is the latest version supported by Prefactor (?).

Can you constrain at which step of prefactor the data might have gotten corrupted?

Im not advanced enough, but @mhardcastle wrote:

Basically the MS is corrupt as I thought -- in fact on a spot check all of the MSs in this observation are corrupt. Any attempt from inside casacore to read the antenna table fails. Try this in a python window:

import pyrap.tables as pt t=pt.table('L258201_SB244_uv.dppp_125C46F7At_125MHz.pre-cal.ms/ANTENNA') print(t[:]['NAME'])

If instead you do

for i in range(len(t)): print(i,t[i]['NAME'])

you will see that all the local and remote stations are present and so the antennas that are missing in the table are the international ones. So probably it's in the process of removing those international stations that something's been corrupted. I can only guess that somewhere in the pre-factor run there's some incompatibility with some of the other code that causes this to happen.

adrabent commented 3 years ago

Yes, DP3 4.2 should be the latest version supported by the genericpipeline version of prefactor. The only testbed I currently have is DP3 4.1 and there I can not reproduce this error.

Is it feasible for you to "downgrade" to 4.1 to see if this solves the problem? Or you could try to play around with 5.1. In our CWL test runs we also have not encountered corrupted data so far.

AlexKurek commented 3 years ago

After recomputing L258201 using current master of Prefactor and DP3 4.2 Im still getting this error. EDIT: Using DP3 4.1 I also got this error.

Is there any documentation how to run Prefactor 3 CWL ( https://git.astron.nl/eosc/prefactor3-cwl )? I understand I should compile: https://git.astron.nl/eosc/prefactor3-cwl/-/blob/master/Docker/Dockerfile-base and them install prefactor3-cwl by setup.py?

adrabent commented 3 years ago

Dear @AlexKurek

yes, there is a documentation for running prefactor3-cwl, but this has not been published as HTML so far.

EDIT: Using DP3 4.1 I also got this error.

This is really a concern. Could you try to reproduce your issue with a minimal example? As I understood one possible reason could be the removal of the International Stations. This happens right at the beginning. If you would just run a simple parset doing only this job .. will it corrupt the data?

Cheers, Alex

AlexKurek commented 3 years ago

If you would just run a simple parset doing only this job

Is there such parset available somewhere? Do I need in fact two parset files for this - for calibrator and target?

adrabent commented 3 years ago

You could try something like this as a parset:

msin                                =   input.MS
msin.datacolumn                     =   DATA
msin.baseline                       =   [CR]S*&
msin.autoweight                     =   False
msout                               =   test.MS
msout.datacolumn                    =   DATA
msout.writefullresflag              =   False
msout.overwrite                     =   True
msout.storagemanager                =   "Dysco"
msout.storagemanager.databitrate    =   0
steps                               =   [filter]
filter.type                         =   filter
filter.baseline                     =   [CR]S*&
filter.remove                       =   true
AlexKurek commented 3 years ago

There are commits related to removing stations, so I recomputed L258201. I upload here summary files. summaries.zip

adrabent commented 3 years ago

Summary files look fine to me. Do you mean you made a pull request?

AlexKurek commented 3 years ago

Summary files look fine to me. Do you mean you made a pull request?

No, no, I meant those recent commits that are now merged. Here is the first MS created by you parset http://www.oa.uj.edu.pl/A.Kurek/test.MS.zip I will make more of them and check if they are corrupted.

adrabent commented 3 years ago

Which commits do you mean are related to treating the removal of stations?

AlexKurek commented 3 years ago

I fought commits form https://github.com/lofar-astron/prefactor/commit/de347021d488d181b92705d84bf63cefadd4400f till current master could be related to this issue.

adrabent commented 3 years ago

I am afraid they are not related to your issue :(

AlexKurek commented 3 years ago

I did:

msin                                =   /storage/akurek/N4449_mozaik/L258201_tests_small/3C196/L258197_SB000_uv.dppp.MS
msin.datacolumn                     =   DATA
msin.baseline                       =   [CR]S*&
msin.autoweight                     =   False
msout                               =   test.MS
msout.datacolumn                    =   DATA
msout.writefullresflag              =   False
msout.overwrite                     =   True
msout.storagemanager                =   "Dysco"
msout.storagemanager.databitrate    =   0
steps                               =   [filter]
filter.type                         =   filter
filter.baseline                     =   [CR]S*&
filter.remove                       =   true

and later as @mhardcastle suggested:

import pyrap.tables as pt
t=pt.table('/storage/akurek/N4449_mozaik/L258201_tests_small/test.MS/ANTENNA')
print(t[:]['NAME'])

for i in range(len(t)):
    print(i,t[i]['NAME'])

and Im getting:

Successful readonly open of default-locked table /storage/akurek/N4449_mozaik/L258201_tests_small/test.MS/ANTENNA: 10 columns, 69 rows Traceback (most recent call last): File "first.py", line 3, in print(t[:]['NAME']) File "/usr/lib/python3/dist-packages/casacore/tables/table.py", line 466, in getitem return self._row._getitem(key, self.nrows()) File "/usr/lib/python3/dist-packages/casacore/tables/tablerow.py", line 73, in _getitem result.append(self.get(rownr)) File "/usr/lib/python3/dist-packages/casacore/tables/tablerow.py", line 50, in get return self._get(rownr) RuntimeError: SSMIndex::getIndex - access to non-existing row 62 in column OFFSET of table /storage/akurek/N4449_mozaik/L258201_tests_small/test.MS/ANTENNA

mhardcastle commented 3 years ago

So you just removed the international stations with DPPP and that corrupted the MS? Then it's a DPPP bug. What version of DPPP are you using?

AlexKurek commented 3 years ago

So you just removed the international stations with DPPP and that corrupted the MS? Then it's a DPPP bug. What version of DPPP are you using?

DPPP 4.2, currently the latest one supported by Prefactor.

EDIT: I just tried DP3 master and got the same error.

adrabent commented 3 years ago

@AlexKurek .. please post this result as bug to https://github.com/lofar-astron/DP3