ratt-ru / tricolour

Holds an offline, MS direct version of the SDP online flagger.
Other
8 stars 3 forks source link

SPW selection (or relaxation of assumptions) required #77

Open IanHeywood opened 3 years ago

IanHeywood commented 3 years ago

Trying to use Tricolour to flag a VLA P-band observation. The original MS has 16 SPWs. I've split the SPWs out into their own MS to process (and probably image) them independently. This preserves the SPECTRAL_WINDOW subtable, so that has 16 rows but the main table only now references one of them. Tricolour crashes when processing anything other than SPW0:

tricolour - 2020-12-18 20:51:20,039 INFO - Flagging based on quadrature polarized power
tricolour - 2020-12-18 20:51:22,305 INFO - Only considering scans '2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24' as per user selection criterion
tricolour - 2020-12-18 20:51:22,306 INFO - Adding field 'SGRA' scan 2 to compute graph for processing
Unexpected error. Dropping you into pdb for a post-mortem.
Traceback (most recent call last):
  File "/usr/local/bin/tricolour", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tricolour/apps/tricolour/app.py", line 258, in main
    _main(args)
  File "/usr/local/lib/python3.6/dist-packages/tricolour/apps/tricolour/app.py", line 370, in _main
    ddid = ddid_ds[ds.attrs['DATA_DESC_ID']]
IndexError: list index out of range
> /usr/local/lib/python3.6/dist-packages/tricolour/apps/tricolour/app.py(370)_main()
-> ddid = ddid_ds[ds.attrs['DATA_DESC_ID']]

This is with the version bundled with the stimela 1.5.0 container.

Cheers.

bennahugo commented 3 years ago

Thanks for highlighting this Ian. At the moment tricolour only only supports meerkat single spw 0 cases. However Simon and I have thought of making it more generic to support things like vla. We will look at it in the new year as I'm currently going on leave to try and wrap up my PhD articles.

On Fri, 18 Dec 2020, 22:54 IanHeywood, notifications@github.com wrote:

Trying to use Tricolour to flag a VLA P-band observation. The original MS has 16 SPWs. I've split the SPWs https://github.com/ska-sa/owlcat/blob/master/Owlcat/bin/split-ms-spw.py out into their own MS to process (and probably image) them independently. This preserves the SPECTRAL_WINDOW subtable, so that has 16 rows but the main table only now references one of them. Tricolour crashes when processing anything other than SPW0:

tricolour - 2020-12-18 20:51:20,039 INFO - Flagging based on quadrature polarized power tricolour - 2020-12-18 20:51:22,305 INFO - Only considering scans '2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24' as per user selection criterion tricolour - 2020-12-18 20:51:22,306 INFO - Adding field 'SGRA' scan 2 to compute graph for processing Unexpected error. Dropping you into pdb for a post-mortem. Traceback (most recent call last): File "/usr/local/bin/tricolour", line 8, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/tricolour/apps/tricolour/app.py", line 258, in main _main(args) File "/usr/local/lib/python3.6/dist-packages/tricolour/apps/tricolour/app.py", line 370, in _main ddid = ddid_ds[ds.attrs['DATA_DESC_ID']] IndexError: list index out of range

/usr/local/lib/python3.6/dist-packages/tricolour/apps/tricolour/app.py(370)_main() -> ddid = ddid_ds[ds.attrs['DATA_DESC_ID']]

This is with the version bundled with the stimela 1.5.0 container.

Cheers.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ska-sa/tricolour/issues/77, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6R6ZYAHR3IO4KEPH4TSVO6QXANCNFSM4VBVOXIA .

sjperkins commented 3 years ago

-> ddid = ddid_ds[ds.attrs['DATA_DESC_ID']]

Just a note, this is failing a lookup in the DATA_DESCRIPTION subtable using DATA_DESC_ID values in the main table. It would be worth posting the unique DATA_DESC_ID values in the main table, as well as the contents of the DATA_DESCRIPTION table so that we can investigate in more detail post-holiday season.

o-smirnov commented 3 years ago

That does suggest a malformed MS, doesn't it?..

bennahugo commented 3 years ago

Indeed if the spw is misindexed after being split it indicates a CASA bug. Maybe worth pinging the casa helpdesk

On Sat, 19 Dec 2020, 10:52 Oleg Smirnov, notifications@github.com wrote:

That does suggest a malformed MS, doesn't it?..

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ska-sa/tricolour/issues/77#issuecomment-748444370, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6XE44UNMXLE5XLCGQDSVRSWDANCNFSM4VBVOXIA .

o-smirnov commented 3 years ago

Well let's confirm it first. @IanHeywood could you please check your DATA_DESCRIPTION and SPECTRAL_WINDOW subtables, how many rows have you got per each, and what are your values in your DATA_DESC_ID column?

IanHeywood commented 3 years ago

All looks reasonable to me...

In [1]: from pyrap.tables import table

In [2]: spwtab = table('15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms/SPECTRAL_WINDOW/')
Successful readonly open of default-locked table 15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms/SPECTRAL_WINDOW/: 17 columns, 16 rows

In [3]: nn = spwtab.getcol("NAME")

In [4]: ff = spwtab.getcol("REF_FREQUENCY")

In [5]: for i in range(0,len(nn)):
   ...:     print i,nn[i],ff[i]
   ...:
0 EVLA_P#A0C0#16 224000000.0
1 EVLA_P#A0C0#17 240000000.0
2 EVLA_P#A0C0#18 256000000.0
3 EVLA_P#A0C0#19 272000000.0
4 EVLA_P#A0C0#20 288000000.0
5 EVLA_P#A0C0#21 304000000.0
6 EVLA_P#A0C0#22 320000000.0
7 EVLA_P#A0C0#23 336000000.0
8 EVLA_P#A0C0#24 352000000.0
9 EVLA_P#A0C0#25 368000000.0
10 EVLA_P#A0C0#26 384000000.0
11 EVLA_P#A0C0#27 400000000.0
12 EVLA_P#A0C0#28 416000000.0
13 EVLA_P#A0C0#29 432000000.0
14 EVLA_P#A0C0#30 448000000.0
15 EVLA_P#A0C0#31 464000000.0

In [6]: ddtab = table('15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms/DATA_DESCRIPTION/')
Successful readonly open of default-locked table 15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms/DATA_DESCRIPTION/: 3 columns, 16 rows

In [7]: ddtab.colnames()
Out[7]: ['FLAG_ROW', 'POLARIZATION_ID', 'SPECTRAL_WINDOW_ID']

In [8]: ddtab.getcol('SPECTRAL_WINDOW_ID')
Out[8]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
      dtype=int32)

In [9]: import numpy

In [10]: maintab = table('15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms')
Successful readonly open of default-locked table 15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms: 24 columns, 1777113 rows

In [11]: print(numpy.unique(maintab.getcol("DATA_DESC_ID")))
[12]

I've carried on with the processing using flagdata, so there's no rush. This and all other CASA tasks seem fine with it, I just wanted to try tricolour out. The RFI situation at P-band is pretty horrid.

Thanks.

EDIT: Fixed the pasted part.

bennahugo commented 3 years ago

Hey Ian. Cool can you also do a np.unique on the ddid column of the main table?

On Mon, 21 Dec 2020, 19:22 IanHeywood, notifications@github.com wrote:

All looks reasonable to me...

In [1]: from pyrap.tables import table

In [2]: spwtab = table('15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms/SPECTRAL_WINDOW/') Successful readonly open of default-locked table 15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms/SPECTRAL_WINDOW/: 17 columns, 16 rows

In [3]: nn = spwtab.getcol("NAME")

In [6]: ddtab = table('15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms/DATA_DESCRIPTION/') Successful readonly open of default-locked table 15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms/DATA_DESCRIPTION/: 3 columns, 16 rows

In [7]: ddtab.colnames() Out[7]: ['FLAG_ROW', 'POLARIZATION_ID', 'SPECTRAL_WINDOW_ID']

In [8]: ddtab.getcol('SPECTRAL_WINDOW_ID') Out[8]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)

In [9]: import numpy

In [10]: maintab = table('15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms') Successful readonly open of default-locked table 15A-310.sb30701840.eb31080533.57264.008705092594_SPW16-31_hanning_spw12.ms: 24 columns, 1777113 rows

In [11]: print(numpy.unique(maintab.getcol("DATA_DESC_ID"))) [12] In [4]: ff = spwtab.getcol("REF_FREQUENCY")

In [5]: for i in range(0,len(nn)): ...: print i,nn[i],ff[i] ...: 0 EVLA_P#A0C0#16 224000000.0 1 EVLA_P#A0C0#17 240000000.0 2 EVLA_P#A0C0#18 256000000.0 3 EVLA_P#A0C0#19 272000000.0 4 EVLA_P#A0C0#20 288000000.0 5 EVLA_P#A0C0#21 304000000.0 6 EVLA_P#A0C0#22 320000000.0 7 EVLA_P#A0C0#23 336000000.0 8 EVLA_P#A0C0#24 352000000.0 9 EVLA_P#A0C0#25 368000000.0 10 EVLA_P#A0C0#26 384000000.0 11 EVLA_P#A0C0#27 400000000.0 12 EVLA_P#A0C0#28 416000000.0 13 EVLA_P#A0C0#29 432000000.0 14 EVLA_P#A0C0#30 448000000.0 15 EVLA_P#A0C0#31 464000000.0

I've carried on with the processing using flagdata, so there's no rush. This and all other CASA tasks seem fine with it, I just wanted to try tricolour out. The RFI situation at P-band is pretty horrid.

Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ska-sa/tricolour/issues/77#issuecomment-749095965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6V7DPEW4X7VV4AAXFTSV577BANCNFSM4VBVOXIA .

IanHeywood commented 3 years ago

I thought that's what lines [10] and [11] were above, but please let me know if I'm missing something.

My original paste ended up a bit mangled, so if you read it off the original email it might have been missing...

bennahugo commented 3 years ago

Silly me I missed the output of that. Ok looks conformant to me. The bug must be in tricolour

On Mon, 21 Dec 2020, 22:59 IanHeywood, notifications@github.com wrote:

I thought that's what lines [10] and [11] were above, but please let me know if I'm missing something.

My original paste ended up a bit mangled, so if you read it off the original email it might have been missing...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ska-sa/tricolour/issues/77#issuecomment-749193099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6WCCVBJDI3LAVJ4WFLSV6ZLLANCNFSM4VBVOXIA .

sjperkins commented 3 years ago

Thanks for the detailed info @ianheywood. If possible, could you place the MS on a Rhodes server?

sjperkins commented 3 years ago

@IanHeywood were you able to place the MS on the Rhodes server?

@smasoka Could you please do some investigation here to find out why this is breaking? You can message me if you become blocked on this issue.

IanHeywood commented 3 years ago

Not yet. Was going to "officially" return to work today but then I decided that would be a bad move. I'll transfer something this week, sorry for the delay.

sjperkins commented 3 years ago

Not yet. Was going to "officially" return to work today but then I decided that would be a bad move. I'll transfer something this week, sorry for the delay.

Holidays are good. Can confirm.

smasoka commented 3 years ago

@IanHeywood can I get the location of the MS on the Rhodes server? Is the MS also at CHPC lustre?

Can I also get you to look at #76 for some feedback?

IanHeywood commented 3 years ago

Yes, sorry, these things are slowly floating to the top of the to-do list.

There is a MS here on nash:

/home/ianh/tricolour_SPW/15A-310.sb30704857.eb31073319.57261.07368590278_spw24.ms

that triggers this bug. Note that this MS is the result of using Owlcat's SPW-splitter, which preserves the original SPW table in the resulting Measurement Sets. Looping over SPWs and splitting them out with CASA's mstransform task produces a SPECTRAL_WINDOW table that just has a single row, corrects the main table accordingly, and avoids the problem.

Thanks.