ivoa-std / ObsCoreExtensionForRadioData

ObsCore model extension for radio data
Creative Commons Attribution Share Alike 4.0 International
0 stars 6 forks source link

f_min / f_max / f_resolution #48

Open Bonnarel opened 3 months ago

Bonnarel commented 3 months ago

Discussion started after 2023-11-98 release

Bonnarel commented 3 months ago

You'll not be surprised that I still think sect. 4.2, f_min and f_max, is a bad idea and you should rather require an ivo_specconv function as discussed before; it'll also immediately placate folks who want Hz, MHz, GHz, or THz instead of the kHz you went for.

-- MarkusDemleitner -2023-11-09

Bonnarel commented 3 months ago

(1) The longer I think about them, the less I like f_min and f_max. If you look at use case 1.3: "range inside the 1 to 1.5 Ghz band" -- and then people have to write f_min > 1000 AND f_max < 1500 and thus do some conversion anyway, and to the relatively random unit MHz on top. Please let's reconsider this; I have sympathies for not wanting to write the λ-ν conversions manually, but if

1= ivo_interval_overlaps( em_min, em_max, ivo_specconv(1.5, "GHz", "m"), ivo_specconv(1, "GHz", "m"))

doesn't work for you, let's think again and figure out something that's less verbose. But let's not define something parallel to em_min and em_max with an even more random unit than m.

-- MarkusDemleitner -2023-12-11

Bonnarel commented 3 months ago

That is quite a mouthfull... but it does bother me as well to provide what is essentially the same information twice. And I agree that the arbitrary units are problematic (the current draft specifies f_min/max as using "Mhz" but f_resolution as using "kHz"). We do need to retain f_resolution though as em_res_power simply varies too much for low-frequency observations that span a large fractional bandwidth. Radio observations typically have a fixed frequency reolution (and therefore varying resolving power) across the band.

-- MarkKettenis -2023-12-13

Bonnarel commented 3 months ago

If that's a concern in practice, we can have a more specific function for matching radio intervals in obscore tables; but to design these,saying having a few clearer use cases would be useful. Perhaps

1= ivo_has_radio_interval(1, 1.5, "GHz")

(that has built-in knowledge about em_min and em_max) is justifiable?

-- MarkusDemleitner -2023-12-13

Bonnarel commented 3 months ago

-- MarjoleinVerkouter -2023-12-18

Bonnarel commented 3 months ago

Yes I think it's reasonable to use the same unit, Hz for all the frequency fields. We will change that in the draft.

That's much simpler than to find the right multiple for any quantity and sub domain. You are right!

As for the frequency characterization, I would advocate :

Why are we building an ObsCore extension for radio data ?

We are indeed speaking of data discovery.

"core ObsCore" metadata help to discover any kind of datasets in any spectral domain. But in the radio domain some specificities are not well enough taken into account by the standard.

And this is specifically the case for raw data (visibilities in the radio case)

The consequence is that the result of a query is only roughly matching some of the discovery tasks.

The basic idea with the ObsCore extensions is the same than the one we enhanced by creating some Optional fields in the original ObsCore. Not forcing anything but ALLOWING to add details in order to better tackle some specific needs.

When the CSP identified the rather low uptake of VO service in the radio domain and defined the goal to fix that as an IVOA priority, and when in parallel the Euro-VO Asterics project (+ESCAPE) held several sessions around this, the first thing we heard from many/many radio astronomers and potential users was :

"Hahh, gosh .... Wavelengths !!!"

After explaining these colleagues why we definitely needed a common language for everybody and why wl is the minimal lingua franca, we also considered to provide them with a couple of additional fields useful for them.

In practice I imagine many radio archives start by storing their metadata in frequencies and transform them into wavelengths using functions, views or whatever to be consistent with ObsCore.

So for sure conversion udf probably exist anyway. But this is implementation.

But If we provide an extension, then this extension should be easy to use for queries and for metadata visualisation for the users. And also for client developers.

Nobody is forced to use a radio extension. But if people are in the spirit of using it then this has to be easy and readable for them.

Last thing : if we have a parameter based interface to ObsCore (and extensions) in a near future (see :https://github.com/ivoa-std/DAP), frequency characterization will be provided with an optional parameter and with standard column names anyway.

-- FrancoisBonnarel 2023-12-22

Bonnarel commented 3 months ago

ALLOWING to add details

Uh... let's be very careful with language in the vicinity of "allowing" and "optional".

If the obscore extensions are supposed to work for global discovery (i.e., one query is executable on all compliant services), then all fields people may write constraints against (and that means: by default all of them) need to be mandatory for the ivoa.obs_radio table.

Without such a requirement, an all-VO query would first have to work out where some column is available and then decide whether to re-write the query and drop some constraint or whether to skip the service in question. That would be painful for client writers and mystifying for their users.

Ceterum censeo Optional Features Are A Bane.

So: either we have f_min/_max or we don't.

"Hahh, gosh .... Wavelengths !!!"

And right they were: We should have used energies, which not only (like frequencies) are independent of the medium but also work for massive messengers.

But we didn't, and we can't sensibly fix that by adding extra columns. I, for one, would totally be in favour of planning a transition to energies over a few versions of obscore, but that's nothing an extension could do.

In practice I imagine many radio archives start by storing their metadata in frequencies and transform them into wavelengths using functions, views or whatever to be consistent with ObsCore.

For ingestion, I claim it really doesn't matter; the ingestion rules are written once, and very quickly on top.

No, the question is: Can we make writing obscore queries more pleasant across the electromagnetic spectrum? I can see how f_min/_max help a bit there, but it's just a bit (because everyone still has their non-Hz native units, including wavelengths ("21 cm", "submillimeter")). And to me the massive denormalisation is too high a price for what little it buys.

Anyway, if you really can't find it in yourself to simply drop the two columns, at least say something like: "Non-NULL f_min MUST be equal to c/em_min and f_max MUST be equal to c/em_max, with c=299792458 m/s; implementations are advised to ensure this by using, for instance, views."

But don't you agree that, written like this, this definitely looks like a bad idea? I notice in passing that in this way, it might be that

em_min between 1e-2 and 2e-2

is fast but the -- according to the above stipulation -- equivalent

f_max between 14989622900.0 and 29979245800.0

is not. That's because the query planner may very well not be smart enough to see it could use an index on e_min when the query it sees after expanding the view statement is against 14989622900.0/em_min[1]

Nobody is forced to use a radio extension....

Well, but we certainly would like them to use it if the have radio data, right?

-- MarkusDemleitner 2024-01-03