i4Ds / Karabo-Pipeline

The Karabo Pipeline can be used as Digital Twin for SKA
https://i4ds.github.io/Karabo-Pipeline/
MIT License
11 stars 4 forks source link

Improve handling of different visibility formats #595

Closed mpluess closed 4 weeks ago

mpluess commented 3 months ago

Currently, when generating visibilities with OSKAR, they are written to disk in both measurement set (MS) and the .vis binary format, essentially doubling the amount of disk space used. This needs to be fixed, especially for larger runs.
When generating visibilities with RASCIL, data is only saved to disk in .vis format.
Internally, Karabo currently works with two different types of visibilities: OSKAR returns karabo.simulation.visibility.Visibility objects whereas RASCIL returns ska_sdp_datamodels.visibility.Visibility objects. It would make sense to create a generalized visibility class and only return objects of this new class.

For the subsequent dirty imaging, we have three imagers.
OSKAR can only handle karabo.simulation.visibility.Visibility objects. Seems like it can work off MS as well as .vis files, preferring .vis, but this needs to be verified.
RASCIL can handle karabo.simulation.visibility.Visibility as well as ska_sdp_datamodels.visibility.Visibility objects and supports MS as well as .vis files.
WSClean can only handle karabo.simulation.visibility.Visibility objects and requires the MS format.

Image cleaning currently requires the MS format (to do dirty imaging + cleaning in one go) or an existing dirty image (only supported by WSClean, not RASCIL, see https://github.com/i4Ds/Karabo-Pipeline/issues/572).

There is existing functionality to convert between .vis and MS:

Proposed solution:

Lukas113 commented 3 months ago

Agree to have as default the MS format since it has more support :+1:

I think it's worth to not drop the .vis support wherever possible, if the overhead is not large. E.g. karabo.data.obscore.ObsCoreMeta.from_visibility could support both easily.

About the interface choice to only accept Karabo objects instead of e.g. MS fpath, I'm not 100% sure about. The most user-friendly way I can think of is to accept both. And if it's not an overhead or leads to a mess, I suggest to support both.

sfiruch commented 2 months ago

As discussed in 2024-08-22 meeting, the OSKAR default should be changed to MS.

Later, we should have automatic data type conversion when needed, ideally streaming - without touching the disk. Even later, these choices could be optimized by generating & optimizing the full data flow graph (i.e. lazy evaluation). But that's 2030 territory :)

sfiruch commented 1 month ago

We should keep design/APIs flexible enough to add support for other file formats in the future. MSv3 or DASK-MS are formats that are being discussed, so we'll probably need to support one/two of these in the future.

We're fine if .vis support is not as good, or not well-supported at the moment. But we should design our code with the understanding that we'll need support for more formats in the future.