ivoa-std / ObsCoreExtensionForRadioData

ObsCore model extension for radio data
Creative Commons Attribution Share Alike 4.0 International
0 stars 6 forks source link

table and standardID management #49

Open Bonnarel opened 4 months ago

Bonnarel commented 4 months ago

Discussion started after 2023-11-98 release

Bonnarel commented 4 months ago

[The ObsCore extension for radio (including or not visibility data) described above SHOULD be added to the main ObsCore table.

This is painful for data centres that have only a few radio items and millions of non-radio items (such as me). In effect, you'd be forcing me to do a massive denormalisation of my data. And yes, people shouldn't do SELECT *, but they do, and then they have all these extra columns with NULLs in them for no benefit at all, in particular because you acknowledge it'd be a joined view anyway.

On the other hand, if there were some overriding reason to have the columns in the main obscore table, then don't make it SHOULD. If this is supposed to work at all, it must be a MUST -- software doing radio obscore queries has to be able to rely on it if it's necessary for some of your usecases.

But as I said: I don't believe it's necessary, and then we shouldn't do it, neither as SHOULD or MUST (I take the liberty of citing my own https://blog.g-vo.org/requirements-and-validators.html in support of this argument).

Doc : In practice a table containing only the extension attributes MAY be added to the same schema.

That's where I believe the MUST needs to sit. That way, your sample queries will have a JOIN (my advice based on RegTAP experience: make it a NATURAL JOIN and leave it to the implementors what the join columns actually are). Side benefit discovery rules are a lot simpler.

You'd just be saying:

Register the obscore extension as a VODataService CatalogService with a tableset only containing the ivoa.obsradio [or whatever; feel free to suggest a different name. I'd also be open to loosening naming schemes for ivoa.obscore, but I don't think the radio extension is the place to start that discussion] table. Assign a table utype of ivo://ivoa.net/std/obscore#radioext-1.0 to that table. For later extensibility, discover radio-extended obscore tables using this utype (rather than the table name). In RegTAP, you would find TAP services having radio extensions like this:

SELECT access_url, table_name FROM rr.capability NATURAL JOIN rr.interface NATURAL JOIN rr.res_table WHERE standard_id='ivo://ivoa.net/std/tap' AND table_utype like 'ivo://ivoa.net/std/obscore#radioext-1.%'

I'd do a PR for that if you don't flame me too hard.

Also, if someone gives me radio data needing these extra columns, I'm happy to do a reference implementation.

-- MarkusDemleitner 2023-11-09

BaptisteCecconi commented 3 months ago

I understand your concern as a data centre provider, and also as the developer of DaCHS. But on the point of view of the user, I'm not sure having to do a join is really a simple and appealing solution, at least at the CatalogService level. At server level (with ivoa.obscore and ivoa.obsradio tables), that may be reasonable.

I'm thinking of how we do it in EPNcore, where we don't have the global ivoa.obscore table at server level. In our case, EPNcore tables include extra columns (some from extensions, some are specific to each table).

In VESPA, we have not worked out (yet) the multiple typing of the table as in EPNcore + extensions. In VODataService/CatalogService, it is not an issue since the cardinality of the utype child of vs:Table is minOccurs="0", so there can be several utype's per table. However, I'm not sure how this will behave in RegTAP...

So one other option could be to have all columns in a table of a CatalogService (including as many extensions as necessary), and building the ivoa.obscore selecting only the relevant columns. The ivoa.obsradio, ivoa.obstimeseries, etc, could contain a subset of the columns too, relevant of that extension, and requiring the JOIN operation.

msdemlei commented 3 months ago

It's hard to speculate what the eventual consumers will or will not like, but I'm rather convinced that "write 'NATURAL JOIN ivoa.obs_radio' if you want to get and/or constrain radio columns" is about the simplest thing we can give to people.

And I'm sure that hiding the join is something that will lead to no end of confusion; sure, the nice thing about relational algebra is that it's an algebra, but in practice, abstractions do leak.

On the other hand, establishing "write NATURAL JOIN x to get more columns" sounds like something that can scale very well (though I will give you that when there's 1:n relationships coming into the game, that abstraction will leak, too).

Bonnarel commented 3 months ago

[The ObsCore extension for radio (including or not visibility data) described above SHOULD be added to the main ObsCore table.

This is painful for data centres that have only a few radio items and millions of non-radio items (such as me). In effect, you'd be forcing me to do a massive denormalisation of my data. And yes, people shouldn't do SELECT *, but they do, and then they have all these extra columns with NULLs in them for no benefit at all, in particular because you acknowledge it'd be a joined view anyway.

On the other hand, if there were some overriding reason to have the columns in the main obscore table, then don't make it SHOULD. If this is supposed to work at all, it must be a MUST -- software doing radio obscore queries has to be able to rely on it if it's necessary for some of your usecases.

But as I said: I don't believe it's necessary, and then we shouldn't do it, neither as SHOULD or MUST (I take the liberty of citing my own https://blog.g-vo.org/requirements-and-validators.html in support of this argument).

I think there was a misunderstanding and I shoud rephrase the document (see your PR later today). I'm not proposing to have the radio extension fields in the main obscore table but to have two different tables : obscore only with appropriate standardID and ObsCore+"extension fields" as a secondtable with a different standardID. Al other things (exposing or not the extension fields only in a specific view. Working with views, etc...) are implementation details.

Bonnarel commented 3 months ago

I agree with all you wrote Baptiste

So one other option could be to have all columns in a table of a CatalogService (including as many extensions as necessary), and building the ivoa.obscore selecting only the relevant columns. The ivoa.obsradio, ivoa.obstimeseries, etc, could contain a subset of the columns too, relevant of that extension, and requiring the JOIN operation.

So there will be the full table with all extensions, another one with the basic Obscore only, and then if needed the variousn extenion tables ? But do we really have multi utypes on the full table in RegTAP ? or should we define a single combined standardID for a single table utype ?

msdemlei commented 3 months ago

Hi François,

On Mon, Mar 25, 2024 at 11:11:46AM -0700, François Bonnarel wrote:

So there will be the full table with all extensions, another one with the basic Obscore only, and then if needed the variousn extenion tables ? But do we really have multi utypes on the full table in RegTAP ? or should we define a single combined standardID for a single table utype ?

It's questions like these that make me convinced that we don't want any pre-configured views; just let the extensions set next to each other and give a few example joins in the documentation and ideally in service-defined examples.

When we let users write joins (and they'll quickly learn that; it's really not rocket science) themselves, every table has just one utype, and all is nice and tidy. And we also escape the combinatorial explosion of having n! pre-configured views for n extensions.

  -- Markus