Open Victoria-Samboco opened 1 year ago
Problem is here: scan: '{recipe.myscan}'
The scan
in the casa split task should be str
. Here you are calling scan ='{recipe.myscan}'
which is a for-loop over ms:
for_loop:
var: myscan
over: ms
You should set it to ' '
for all scans:
scan -- Scan number range
default: ’’ = all
Glad you brought this up because splitting scan by all
is not working. for me as expected. If I set to ' '
I am only given 1 MS with all the scan numbers:
scans = list(numpy.unique(t.getcol('SCAN_NUMBER')))=[3, 5, 7, 9, 14, 16]
I recall this working previously when I tried it a while back in the pipeline, so not sure whats gone wrong.
I'm also having the same problem when set the default to all, when It works give me only one MS with all scans, meaning that it's not really splitting. I think the other option is to use the python script we used before.
On Tue, 22 Nov 2022, 00:08 Kincaidr, @.***> wrote:
Problem is here: scan: '{recipe.myscan}' The scan in the casa split task should be str. Here you are calling scan ='{recipe.myscan}' which is a for-loop over ms:
for_loop:
var: myscan over: ms
You should set it to all.
Glad you brought this up because splitting scan by all is not working for me as expected. If I set it to all I get this error:
CASA <1>: split(vis='MIGHTEE_CDFS_raw-J0333_2741-corr.ms',scan='all',outputvis='splitted_ns')
2022-11-21 21:49:24 SEVERE split:::: Scan Expression: Parse error at or near 'a'
2022-11-21 21:49:24 SEVERE split::::+ (near char. 1 in string "all")
Out[1]: False
If I set it to `` which should give me the same according to the docs:
scan -- Scan number range
default: ’’ = all
It works; however, I am only given 1 MS with all the scan numbers:
scans = list(numpy.unique(t.getcol('SCAN_NUMBER')))=[3, 5, 7, 9, 14, 16]
I recall this working previously when I tried it a while back in the pipeline, so not sure whats gone wrong.
— Reply to this email directly, view it on GitHub https://github.com/ratt-ru/solarkat/issues/18#issuecomment-1322714339, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWYM2BOX6NTYOJLRCWK2CE3WJPXHRANCNFSM6AAAAAASG6PCE4 . You are receiving this because you authored the thread.Message ID: @.***>
This one.
def split_ms(ms): #Simple split
tb.open(ms)
scans=sorted(set(tb.getcol('SCAN_NUMBER')))
tb.done()
for scan in scans:
scan = str(scan)
SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
#split(vis=ms,outputvis='/home/samboco/solarkat/1GC_UHF/msdir/scans/'+SPLITED_ms,scan=scan, datacolumn='all')
split(vis=ms,outputvis='scansuhf/perscan_ms/'+SPLITED_ms,scan=scan, datacolumn='all')
print(SPLITED_ms + 'DONE')
yes we can do it this way, but I still want to know why casa cannot do it by setting scan to all
scan = 'all'
doesn't work with CASA's split task, you just need to set scan = ''
to split all scans.
scan = 'all'
doesn't work with CASA's split task, you just need to setscan = ''
to split all scans.
Yes I have done this, but the output I get is only 1 MS.
Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.
scans = [33, 6, 10, 14, 18, 29]
for scan in scans:
scan = str(scan)
opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms")
split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")
Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.
scans = [33, 6, 10, 14, 18, 29] for scan in scans: scan = str(scan) opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms") split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")
Yes, that's what I did here:
splitms_by_scan: #Its working well
info: " A loop to split/extract all scans from the Measurement Set and determine the sun coordinates (RA/DEC)"
recipe:
inputs:
myscans:
dtype: List[str]
info: "splitms by scans"
default: ["3","5","7","9","11","13","15","17","19","21","25","27","29","31","33","35","37","39","41","43","47","49"]
for_loop:
var: myscan
over: myscans
steps:
split_my_ms:
cab: splitms
params:
ms: '1583662427_sdp_l0.1024ch-J033230_280757-corr_self.ms'
scan: '{recipe.myscan}'
datacolumn: 'all'
outputvis: "scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan{recipe.myscan}.ms"
Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...
Oh I see! If you want a per-scan MS then you have to run split once per scan, e.g.
scans = [33, 6, 10, 14, 18, 29] for scan in scans: scan = str(scan) opms = "1658342033_sdp_l0_GPM1839-10_polcal.ms".replace(".ms","_scan"+scan+".ms") split(vis="1658342033_sdp_l0_GPM1839-10_polcal.ms",outputvis=opms,scan=scan,datacolumn="all")
Yes this is what we did this before. I thought that if we do scan = ' '
then casa would do this.
Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...
No it doesn't have to be manually. We can just do scans=sorted(set(tb.getcol('SCAN_NUMBER')))
like you showed before. Then in the casa split cab we set scans = ' '
.
But there's an issue with tb
Looks like the only way it can work is giving a list of scans to the for loop...But that's mean that we will have to do it manualy...
No it doesn't have to be manually. We can just do
scans=sorted(set(tb.getcol('SCAN_NUMBER')))
like you showed before. Then in the casa split cab we setscans = ' '
.
But there's an issue with tb in python, because it is a casa command. Do you know which module we can import to use tb from python?
You can use
from casacore.tables import table
tt = table(ms)
scans = sorted(set(tt.getcol('SCAN_NUMBER')))
tt.done()
You can use
from casacore.tables import table tt = table(ms) scans = sorted(set(tt.getcol('SCAN_NUMBER'))) tt.done()
But when using the split
command, it wont be recognized in python.
Oleg has added the flavour
functionality. So something like this might work:
command: |
tb.open(ms)
scans=sorted(set(tb.getcol('SCAN_NUMBER')))
tb.done()
for scan in scans:
scan = str(scan)
SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
split(vis=ms,outputvis='scansuhf/perscan_ms/'+SPLITED_ms,scan=scan, datacolumn='all')
print(SPLITED_ms + 'DONE')
flavour: casa-task
inputs:
vis:
dtype: MS
scan:
dtype: str
outputs:
outputvis:
dtype: MS
params:
yes, it says that split
is not defined
2022-11-22 13:00:02 STIMELA.boom.split_by_scan ERROR: NameError: name 'split' is not defined
2022-11-22 13:00:02 STIMELA.boom.split_by_scan ERROR: find_sun_stimela.split_ms1() threw
exception: name 'split' is not defined'
2022-11-22 13:00:02 STIMELA.boom INFO: saving full profiling stats to
./obsuhf/logs/log-/stimela.stats
I'm a bit lost here with the intracies of stimela. If you are in a situation where you can call the CASA split
task then why can't you use the tb
tool to loop over the scan numbers?
For my own pipeline processing the first step is to run a script that pulls all the relevant metadata out of the MS and writes it to files. Subsequent processing steps can then just consult the files when they need to know things like relevant scan numbers. Maybe something like that would be easier than trying to repeatedly consult the MS as the processing progresses...
Ok I see the problem. If we using flavour: casa-task
for split
we have to say command: split
as such:
command: split
flavour: casa-task
inputs:
vis:
dtype: MS
required: true
default: '{recipe.ms}'
scan:
dtype: str
datacolumn:
dtype: str
default: 'corrected_data'
outputvis:
dtype: str
default: '{recipe.ms}-output'
required: true
So before we call this cab we have to do the table stuff in python and then call the cab. So have 2 steps, one for each:
steps:
scan_numbers: %step1
cab:
command: |
tb.open(ms)
scans=sorted(set(tb.getcol('SCAN_NUMBER')))
tb.done()
for scan in scans:
scan = str(scan)
SPLITED_ms=ms.replace('.ms','_scan'+scan+'.ms')
flavour: python-code
inputs:
ms
outputs:
scan
splitted_ms_name:
SPLITED_ms:
splitms_scan: %step2
command: split
flavour: casa-task
inputs:
vis:
dtype: MS
required: true
default: '{recipe.ms}'
scan:
dtype: str
datacolumn:
dtype: str
default: 'corrected_data'
outputvis:
dtype: str
default: '{recipe.ms}-output'
required: true
params:
vis: '{recipe.ms}'
scan: =steps.scan_numbers.scan
I'm a bit lost here with the intracies of stimela. If you are in a situation where you can call the CASA
split
task then why can't you use thetb
tool to loop over the scan numbers?
Because as for as I know the casa commands and python ones are separate. Currently, in stimela, we can do purely only casa or python but not both.
Then shouldn't tb
work if it's in CASA mode?
Just FYI. It may be easier to do this with dask-ms convert (latest release) since it now supports a taql-where argument eg.
$ dask-ms convert path_to_data.ms -g "FIELD_ID,DATA_DESC_ID,SCAN_NUMBER" -o output.ms --format casa --force --taql-where "SCAN_NUMBER==1"
will split out scan 1 into an MS called output.ms
Then shouldn't
tb
work if it's in CASA mode?
Im trying this now, but getting the same as @Victoria-Samboco . Its not reading tb
I'm trying to split the MS using the MS and not a list of scans as I did before, but I'm having an error with the outputvis.
I want it to return as outputvis something like
outputvis = scansuhf/perscan_ms/1583662427_sdp_l0.1024ch-J033230_280757-corr_self-scan_3.ms
, ... , but it is returningHow can I call the number of scans from the MS without creating a list of it in the recipe?