ivoa-std / ObsCoreExtensionForRadioData

ObsCore model extension for radio data
Creative Commons Attribution Share Alike 4.0 International
0 stars 6 forks source link

Comments by Andreas Wicenec (on the IVOA page) #26

Open Bonnarel opened 1 year ago

Bonnarel commented 1 year ago

From Andreas Wicenec (July 26th 2023) I agree with the comments JohnTobin made above, in particular wrt. uv stats. In addition I would like to raise the following points:

  1. Just a very general comment on use cases: I'm not quite sure whether this was discussed, but what are use cases to search for a visibility data set using an antenna diameter or the minimum and maximum uv distance? In many interferometric arrays there are multiple different antennas and specifying or searching for a single diameter is thus not really useful. The uv distances (as well as eccentricity) are highly variable even within a single, long observation and yes, it seems straight forward to calculate them, but in fact it is not, if a dataset is of the order of many TB or even PB. Thus the question about the usefulness of providing these values in the first place. Who would use these values for queries and why? The explanations in the text are kind of fine for UV snapshots, but for observations spanning many hours, many channels and an in-homogeneous distribution of baseline length, they get far less useful to the degree that they might be misleading or plainly not comparable between datasets from different arrays.

  2. Related to this, the description of the antenna diameter in the table should also mention that this might be the maximum diameter in case of multiple different ones, like it is in the text (2.2)

  3. Another, high level question is more about wording: When reducing a visibility data set it is possible to change the resolution and sacrifice frequency for spatial resolution to a certain degree. It is also possible to change the phase centre as well as the FOV, i.e. the pointing of the final data product. Thus quite a number of the values are more or less just boundaries and in some cases not even very strict ones. I guess one example would be the FOV, since that depends a lot on how far out you are performing the imaging, 1st, 2nd, 3rd null and that is pretty much up to the user to decide. In that sense I think we need to have more constraint descriptions, else these values will not be consistent and comparable, even if we allow for min and max.

  4. In the second sentence of 2 the case wavelength vs frequency is correctly made, but then, in particular for the FOV and resolution description the document is referring to wavelength. I think all of this should be described in terms of frequency.

  5. Measurement sets allow for an almost insane flexibility and this extension will never be able to account for all of the possible variations, else it would need to replicate the MS data model. Thus I would opt for an extension which is as lightweight as possible and fully driven by actual real-life use cases of queries people would be performing on visibility data.

kettenis commented 1 year ago

Point 1. covers quite a lot of territory. The original idea was to characterize the UV coverage of an interferometric observation with a few numbers that can be used in an ADQL query with the goal of selecting observations that are likely to meet requirements in terms of resolution, largest angular scale and image fidelity I think the current proposal probably has too many parameters now and that some of the proposed parameters are essentially trying to describe the same properties in slightly different ways.

I don't think the size of the dataset is all that relevant; reconstructing the UV coverage from the UVW values that are part of the dataset is one possibility but there are other ways to do this. The original code developed by @matmanc for LOFAR calculated this from scratch based on a description of the observation for example. And even if it is reconstructed from the dataset, the UVW metadata will be several orders of magnitude smaller than the visibility data itself and for a well designed data format (e.g. the MeasurementSet) it will be possible to read this data without the need look at the actual visibilities.

Unless you're looking at sources that are variable on the timescale of the observation, the difference between snapshots and longer observations doesn't really matter, at least as long as the time-dependence of the UV coverage doesn't affect calibratebility of the the data too much.

I agree that antenna diameters are not really meaningful for (inhomogenious) interferometric arrays. But it probably is something users would want to know for single dish observations?

Points 2-5 are all very sensible. Especially point 5. Trying to capture all the details is simply not possible.