How to call the number of scans from the MS to the outputvis?

Victoria-Samboco commented 1 year ago

I'm trying to split the MS using the MS and not a list of scans as I did before, but I'm having an error with the outputvis.

solarkat1:
      info: " A loop to split/extract all scans from the Measurement Set and determine the sun coordinates (RA/DEC)"
      params: 
        ms: '{recipe.ms}'
      recipe:
        inputs:
          ms:
            dtype: MS
            info: "splitms by scans"
        for_loop:
          var: myscan
          over: ms
        steps:
          split_my_ms:
            cab: splitms
            params:
              ms: '1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms' 
              scan: '{recipe.myscan}'
              datacolumn: 'all'
              outputvis: "scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_{recipe.myscan}.ms"

I want it to return as outputvis something like outputvis = scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_3.ms , ... , but it is returning

split(vis="1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms",outputvis="scansuhf/perscan_ms/15836
2427_sdp_l0.1024ch-J033230_280757-corr_self-scan_1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
.ms",keepmms=True,field="",spw="",
# 2022-11-21 18:37:55      INFO    split::::+
scan="1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms",antenna="",correlation="",timerange="",in
ent="",
# 2022-11-21 18:37:55      INFO    split::::+
array="",uvrange="",observation="",feed="",datacolumn="all",  
# 2022-11-21 18:37:55      INFO    split::::+
keepflags=True,width=1,timebin="0s",combine="")
# 2022-11-21 18:37:55      INFO    MSTransformManager::parseMsSpecParams   Input file name is
1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
# 2022-11-21 18:37:55      INFO    MSTransformManager::parseMsSpecParams   Data column is ALL
# 2022-11-21 18:37:55      INFO    MSTransformManager::parseMsSpecParams   Output file name is
scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_1583662427_sdp_l0.1024ch-J
33230_280757-corr_self.ms.ms
# 2022-11-21 18:37:55      INFO    MSTransformManager::parseDataSelParams  scan selection is
1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
# 2022-11-21 18:37:55      INFO    MSTransformManager::colCheckInfo        Adding DATA column to
output MS from input DATA column
# 2022-11-21 18:37:55      INFO    MSTransformManager::colCheckInfo        Adding CORRECTED_DATA
column to output MS from input CORRECTED_DATA column
# 2022-11-21 18:37:55      INFO    MSTransformManager::colCheckInfo        Adding MODEL_DATA
column to output MS from input MODEL_DATA column
# 2022-11-21 18:37:55      SEVERE  split::::       Scan Expression: Parse error at or near '_'
# 2022-11-21 18:37:55      SEVERE  split::::+      (near char. 11 in string
"1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms")
# 2022-11-21 18:37:55      INFO    split::::       ##### End Task: split                #####
# 2022-11-21 18:37:55      INFO    split::::+      ##########################################
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms INFO: /usr/bin/casa exited with code 0 after
0:00:04
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR: error validating outputs:
error validating outputs:
└── 'boom.solarkat1.split_my_ms.outputvis':
    scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_1583662427_sdp_l0.1024
    ch-J033230_280757-corr_self.ms.ms doesn't exist
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR: ### failed outputs
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR: cab split_my_ms:
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR:   ms =                             
1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR:   datacolumn = all
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR:   scan =                           
1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms
2022-11-21 20:37:57 STIMELA.boom.solarkat1.split_my_ms ERROR:   outputvis =                      
scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_1583662427_sdp_l0.1024ch-J
33230_280757-corr_self.ms.ms

How can I call the number of scans from the MS without creating a list of it in the recipe?

Kincaidr commented 1 year ago

Problem is here: scan: '{recipe.myscan}' The scan in the casa split task should be str. Here you are calling scan ='{recipe.myscan}' which is a for-loop over ms:

 for_loop:
      var: myscan
      over: ms

You should set it to ' ' for all scans:

   scan -- Scan number range  
        default: ’’ = all

Glad you brought this up because splitting scan by all is not working. for me as expected. If I set to ' ' I am only given 1 MS with all the scan numbers:

scans = list(numpy.unique(t.getcol('SCAN_NUMBER')))=[3, 5, 7, 9, 14, 16]

I recall this working previously when I tried it a while back in the pipeline, so not sure whats gone wrong.

Victoria-Samboco commented 1 year ago

I'm also having the same problem when set the default to all, when It works give me only one MS with all scans, meaning that it's not really splitting. I think the other option is to use the python script we used before.

On Tue, 22 Nov 2022, 00:08 Kincaidr, @.***> wrote:

Problem is here: scan: '{recipe.myscan}' The scan in the casa split task should be str. Here you are calling scan ='{recipe.myscan}' which is a for-loop over ms:

for_loop:
var: myscan

over: ms
You should set it to all.

Glad you brought this up because splitting scan by all is not working for me as expected. If I set it to all I get this error:

CASA <1>: split(vis='MIGHTEE_CDFS_raw-J0333_2741-corr.ms',scan='all',outputvis='splitted_ns')

2022-11-21 21:49:24 SEVERE split:::: Scan Expression: Parse error at or near 'a'

2022-11-21 21:49:24 SEVERE split::::+ (near char. 1 in string "all")

Out[1]: False

If I set it to `` which should give me the same according to the docs:

scan -- Scan number range
    default: ’’ = all
It works; however, I am only given 1 MS with all the scan numbers:

scans = list(numpy.unique(t.getcol('SCAN_NUMBER')))=[3, 5, 7, 9, 14, 16]

I recall this working previously when I tried it a while back in the pipeline, so not sure whats gone wrong.

— Reply to this email directly, view it on GitHub https://github.com/ratt-ru/solarkat/issues/18#issuecomment-1322714339, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWYM2BOX6NTYOJLRCWK2CE3WJPXHRANCNFSM6AAAAAASG6PCE4 . You are receiving this because you authored the thread.Message ID: @.***>

Victoria-Samboco commented 1 year ago

This one.

def split_ms(ms): #Simple split
    tb.open(ms)
    scans=sorted(set(tb.getcol('SCAN_NUMBER')))
    tb.done()
    for scan in scans:
        scan = str(scan)
        SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
        #split(vis=ms,outputvis='/home/samboco/solarkat/1GC_UHF/msdir/scans/'+SPLITED_ms,scan=scan, datacolumn='all')
        split(vis=ms,outputvis='scansuhf/perscan_ms/'+SPLITED_ms,scan=scan, datacolumn='all')
        print(SPLITED_ms + 'DONE')

Kincaidr commented 1 year ago

yes we can do it this way, but I still want to know why casa cannot do it by setting scan to all

IanHeywood commented 1 year ago

scan = 'all' doesn't work with CASA's split task, you just need to set scan = '' to split all scans.

Kincaidr commented 1 year ago

scan = 'all' doesn't work with CASA's split task, you just need to set scan = '' to split all scans.

Yes I have done this, but the output I get is only 1 MS.

IanHeywood commented 1 year ago

Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.

scans = [33, 6, 10, 14, 18, 29]
for scan in scans:
    scan = str(scan)
    opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms")
    split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")

Victoria-Samboco commented 1 year ago

Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.
scans = [33, 6, 10, 14, 18, 29]
for scan in scans:
    scan = str(scan)
    opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms")
    split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")
Yes, that's what I did here:

splitms_by_scan: #Its working well
      info: " A loop to split/extract all scans from the Measurement Set and determine the sun coordinates (RA/DEC)"
      recipe:
        inputs:
          myscans:
            dtype: List[str]
            info: "splitms by scans"
            default: ["3","5","7","9","11","13","15","17","19","21","25","27","29","31","33","35","37","39","41","43","47","49"]
        for_loop:
          var: myscan
          over: myscans
        steps:
          split_my_ms: 
            cab: splitms
            params:
              ms: '1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms' 
              scan: '{recipe.myscan}'
              datacolumn: 'all'
              outputvis: "scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan{recipe.myscan}.ms"

Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...

Kincaidr commented 1 year ago

Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.

scans = [33, 6, 10, 14, 18, 29]
for scan in scans:
    scan = str(scan)
    opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms")
    split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")

Yes this is what we did this before. I thought that if we do scan = ' ' then casa would do this.

Kincaidr commented 1 year ago

Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...

No it doesn't have to be manually. We can just do scans=sorted(set(tb.getcol('SCAN_NUMBER'))) like you showed before. Then in the casa split cab we set scans = ' '.

Victoria-Samboco commented 1 year ago

But there's an issue with tb

Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...

No it doesn't have to be manually. We can just do scans=sorted(set(tb.getcol('SCAN_NUMBER'))) like you showed before. Then in the casa split cab we set scans = ' '.

But there's an issue with tb in python, because it is a casa command. Do you know which module we can import to use tb from python?

IanHeywood commented 1 year ago

You can use

from casacore.tables import table
tt = table(ms)
scans = sorted(set(tt.getcol('SCAN_NUMBER')))
tt.done()

Kincaidr commented 1 year ago

You can use

from casacore.tables import table
tt = table(ms)
scans = sorted(set(tt.getcol('SCAN_NUMBER')))
tt.done()

But when using the split command, it wont be recognized in python.

Oleg has added the flavour functionality. So something like this might work:

command: |
      tb.open(ms)
      scans=sorted(set(tb.getcol('SCAN_NUMBER')))
      tb.done()
      for scan in scans:
          scan = str(scan)
          SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
          split(vis=ms,outputvis='scansuhf/perscan_ms/'+SPLITED_ms,scan=scan, datacolumn='all')
          print(SPLITED_ms + 'DONE')
flavour: casa-task
inputs:
  vis:
   dtype: MS
  scan:
    dtype: str
 outputs:
  outputvis:
    dtype: MS
 params:

Victoria-Samboco commented 1 year ago

yes, it says that split is not defined

2022-11-22 13:00:02 STIMELA.boom.split_by_scan ERROR: NameError: name 'split' is not defined
2022-11-22 13:00:02 STIMELA.boom.split_by_scan ERROR: find_sun_stimela.split_ms1() threw        
exception: name 'split' is not defined'
2022-11-22 13:00:02 STIMELA.boom INFO: saving full profiling stats to
./obsuhf/logs/log-/stimela.stats

IanHeywood commented 1 year ago

I'm a bit lost here with the intracies of stimela. If you are in a situation where you can call the CASA split task then why can't you use the tb tool to loop over the scan numbers?

IanHeywood commented 1 year ago

For my own pipeline processing the first step is to run a script that pulls all the relevant metadata out of the MS and writes it to files. Subsequent processing steps can then just consult the files when they need to know things like relevant scan numbers. Maybe something like that would be easier than trying to repeatedly consult the MS as the processing progresses...

Kincaidr commented 1 year ago

Ok I see the problem. If we using flavour: casa-task for splitwe have to say command: split as such:

 command: split
      flavour: casa-task
      inputs:
        vis: 
          dtype: MS
          required: true
          default: '{recipe.ms}'
        scan:
          dtype: str
        datacolumn:
          dtype: str 
          default: 'corrected_data'
        outputvis:
          dtype: str
          default: '{recipe.ms}-output'
          required: true

So before we call this cab we have to do the table stuff in python and then call the cab. So have 2 steps, one for each:

steps:
  scan_numbers: %step1
    cab:
      command: |
          tb.open(ms)
          scans=sorted(set(tb.getcol('SCAN_NUMBER')))
          tb.done()
          for scan in scans:
              scan = str(scan)
              SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
      flavour: python-code
      inputs:
        ms
      outputs:
        scan
      splitted_ms_name:
        SPLITED_ms:
   splitms_scan: %step2
      command: split
      flavour: casa-task
      inputs:
        vis: 
          dtype: MS
          required: true
          default: '{recipe.ms}'
        scan:
          dtype: str
        datacolumn:
          dtype: str 
          default: 'corrected_data'
        outputvis:
          dtype: str
          default: '{recipe.ms}-output'
          required: true 
       params:
         vis: '{recipe.ms}'
         scan: =steps.scan_numbers.scan

Kincaidr commented 1 year ago

I'm a bit lost here with the intracies of stimela. If you are in a situation where you can call the CASA split task then why can't you use the tb tool to loop over the scan numbers?

Because as for as I know the casa commands and python ones are separate. Currently, in stimela, we can do purely only casa or python but not both.

IanHeywood commented 1 year ago

Then shouldn't tb work if it's in CASA mode?

landmanbester commented 1 year ago

Just FYI. It may be easier to do this with dask-ms convert (latest release) since it now supports a taql-where argument eg.

$ dask-ms convert path_to_data.ms -g "FIELD_ID,DATA_DESC_ID,SCAN_NUMBER" -o output.ms --format casa --force --taql-where "SCAN_NUMBER==1"

will split out scan 1 into an MS called output.ms

Kincaidr commented 1 year ago

Then shouldn't tb work if it's in CASA mode?

Im trying this now, but getting the same as @Victoria-Samboco . Its not reading tb

ratt-ru / solarkat

How to call the number of scans from the MS to the outputvis? #18