ivoa / dm-usecases

The is repo gathers all the material to be used in the DM workshop 2020
The Unlicense
1 stars 3 forks source link

Adding a time series/1d data use case #2

Closed msdemlei closed 3 years ago

msdemlei commented 3 years ago

This has to things I'd like to mention. For one, there are a few actual use cases (in the sense of: what should be done with this particular annotation). And the annotation scheme assumes largely independent DMs and, I think, gets away with that quite nicely.

mcdittmar commented 3 years ago

Markus.. I'm looking over the annotated serialization.
Some questions.. 1) it doesn't use any of the current models-in-progress (well.. Dataset looks ok) One of the primary goals of this workshop is to exercise the models in various scenarios to see how they serve. So, while this gives a good look at the annotation scheme you have in mind, it isn't applied to the same models as the other implementations will be.. so a bit apples vs oranges.

2) Magnitude the table has magnitudes, the annotation does not. It is not in Coords, presumably because it is not space or time domain It is not a Measurement, presumably because it has no errors?

3) Position of the source The table has 2 Params giving the RA,DEC with description saying it is the "Position of source object" There is a SphericalCoordinate mapping to those values This is in the Coords annotation at Coords.space, so it can be found as a Coordinate But no other annotation which gives it context as a source position ( Target.position ? ) or anywhere in relation to the TimeSeries. I don't see when/how a client would know when to use the SphericalCoordinate. This caught my eye because we've had extensive conversation about source/target.position.

4) PhotCal reference to phot/flux? If I recall.. the usage of photDM:PhotCal is to be referenced by the Flux/Magnitudes as part of their "Frame" (SpectralDM). The PhotDM model PhotCal object does not have any association to the Flux/Magnitude values The table flux and phot FIELD elements have this reference to PhotCal, but also has backward pointing references within the PhotCal objects to the FIELDs. Is that part of your annotation scheme? What if multiple columns share the same PhotCal?

msdemlei commented 3 years ago

On Wed, Feb 24, 2021 at 03:35:48PM -0800, Mark Cresitello-Dittmar wrote:

Some questions.. 1) it doesn't use any of the current models-in-progress (well.. Dataset looks ok) One of the primary goals of this workshop is to exercise the models in various scenarios to see how they serve. So, while this gives a good look at the annotation scheme you have in mind, it isn't applied to the same models as the other implementations will be.. so a bit apples vs oranges.

True, but that's kind of the purpose of the exercise; part of my point is that the models as they stand have serious drawbacks that can relatively easily be fixed; the implicit (if I may say so) models here are intended to show that.

2) Magnitude the table has magnitudes, the annotation does not. It is not in Coords, presumably because it is not space or time domain It is not a Measurement, presumably because it has no errors?

There are two PhotCal annotations, one for the flux, the other for the magnitude. Only the flux has an error (which, incidentally, is because the error transforms in a way that it becomes badly asymmetrical in mag when it's large), and hence only the flux is annotated as a measurement. One could add a measurement annotation without an error reference in the mag column as well, but I don't think that would be useful -- would you want to do that? And if so, why?

Somewhat more fundamentally, since PhotCal and Measurement are independent annotations in my scheme, you can't really say "mag is (or is not) a Measurement", only that "mag is (or is not) annotated as Measurement version 1". I don't think our annotation scheme (or rather, meta-model) should do more than that.

3) Position of the source The table has 2 Params giving the RA,DEC with description saying it is the "Position of source object" There is a SphericalCoordinate mapping to those values This is in the Coords annotation at Coords.space, so it can be found as a Coordinate But no other annotation which gives it context as a source position ( Target.position ? ) or anywhere in relation to the TimeSeries. I don't see when/how a client would know when to use the SphericalCoordinate.

Right, that's missing so far -- it would, I think, be part ds:Dataset. If I got to say how it should do that, it would be a sequence of column references in order to keep Dataset separate from STC.

4) PhotCal reference to phot/flux? If I recall.. the usage of photDM:PhotCal is to be referenced by the Flux/Magnitudes as part of their "Frame" (SpectralDM). The PhotDM model PhotCal object does not have any association to the Flux/Magnitude values

...which is one of the things I claim we need to fix, which is why it's done as it is in the example.

  The table flux and phot FIELD elements have this reference to PhotCal, but also has backward pointing references within the PhotCal objects to the FIELDs.
  Is that part of your annotation scheme?  What if multiple columns share the same PhotCal?

No, the backward references are due to the custom GROUP from Ada's time series proposal, which the example contains as well (I've just pulled it out of my live system, which has them). I've maintained that these backward references are a bad idea (indeed, they're one of two reasons why I started on the STC annotation thing all these years ago), and I've been struggling against them, in particular in COOSYS and TIMESYS. Consider them legacy.

mcdittmar commented 3 years ago

Help me see how this approach works well/better for clients.. Your implicit model for Cube is:

Which is essentially "Table + knowledge of dependent/independent axes"

There are 2 Cubes defined

  1. has independent_axes = Field('obs_time'); dependent_axes = Field('phot')   
  2. has independent_axes = Field('obs_time'); dependent_axes = Field('flux'), Field('phot')

I, as a client, find Cube instance no. 1 and want to know 'what is this a cube of?' To find this, I need to extract other annotations which contain the same Field reference

Which leads me to conclude that this is either:

  1. Cube( Time, PhotCal )   
  2. Cube ( Time, Cube )

What I'd would expect to find is an unambiguous

  1. Cube( Time, Magnitude )

The only way I see to identify column 'phot' as a Magnitude from the annotation, is from the PhotCal.magnitudeSystem attribute having a value. That makes it a VERY important attribute! There is the UCD on the Field, but unless it is part of your annotation scheme that it must exist.. one can't rely on that being there to help out.

msdemlei commented 3 years ago

On Thu, Feb 25, 2021 at 08:53:22AM -0800, Mark Cresitello-Dittmar wrote:

Help me see how this approach works well/better for clients.. Your implicit model for Cube is:

  • ndcube:Cube      o independent_axes: RealQuantity[*][*]  <-- [naxes][nrows]      o dependent_axes: RealQuantity[*][*] <-- [naxes][nrows]

Since the things with the square brackets unnerve me a bit, let me make a brief statement: The annotation proposed is for VOTable (which for STC is the most pressing use case).

If we want to annotate FITS arrays or even CDF files or anything else, we'll have to come up with some syntax on how to reference the various things happening in these container formats, including header cards or other native metadata items.

However, the mess is bad enough as is, so I'd suggest to keep in mind we'll have to do that at some point and develop ideas how that could be done but focus on VOTable so we get something to work properly. Big problems need to be solved in reasonable steps.

Which is essentially "Table + knowledge of dependent/independent axes"

There are 2 Cubes defined

  1. has independent_axes = Field('obs_time'); dependent_axes = Field('phot')   
  2. has independent_axes = Field('obs_time'); dependent_axes = Field('flux'), Field('phot')

I'd not put it this way -- as the annotation says, there's one Cube with two observables.

I, as a client, find Cube instance no. 1 and want to know 'what is this a cube of?'

This is an example of why I think it's paramount to have concrete use cases as scenarios what clients are supposed to do with the annotation.

You see, I don't think a client will generally enumerate cubes in this way. Instead, I expect it will see: "Ah, there's a cube in there; offer the user the option to plot this as a cube" (or, in a library: "I'll expose: you have two observables; before proceeding further, choose one").

This cube-plotting option will then tell the user: you can plot either flux or phot, and only when the user selects one of these fields does it become relevant what that actually is.

The client will then inspect the additional annotations of that column and pick one or more that help it configure the plot in the most useful way, for instance:

(a) Ah, I have a measurement annotation -- I'm adding error bars.

(b) Ah, I have a Photcal annotation and a zero point -- offer to convert to mag (or so).

(c) Ah, I have a time annotation -- offer to convert to other time systems.

-- note that the cool part of this scheme is that a client that perhaps doesn't know how to do (b) still can do (a) and (c) without a problem.

To find this, I need to extract other annotations which contain the same Field reference

  • 'obs_time' is in "stc2:Coords" (indirectly) and "stc2:TimeCoordinate.location" (directly)
  • 'phot' is in "phot:PhotCal.value" (directly) and cube no. 2 "ndcube:dependent_axes" (directly)

Which leads me to conclude that this is either:

  1. Cube( Time, PhotCal )   
  2. Cube ( Time, Cube )

Ummm -- I don't think I could follow here, perhaps because I've not quite worked out why you see two Cubes in here. Could you perhaps elaborate a bit why a client would want to do this kind of analysis?

The only way I see to identify column 'phot' as a Magnitude from the annotation, is from the PhotCal.magnitudeSystem attribute having a value. That makes it a VERY important attribute!

Yes, of course -- value is all-important, because that says what the column you are annotating. Without it, the whole annotation is pointless in this scheme. True, this means you will may have to repeat items like filterIdentifier if you have multiple columns using the same photometric system -- but that seems a small price to pay for saving on referencing.

There is the UCD on the Field, but unless it is part of your annotation scheme that it must exist.. one can't rely on that being there to help out.

Well, a client knows it's some kind of photometric thing because of the PhotCal annotation -- and I'm quite convinced a client can conventiently learn this as soon as it needs that: when it knows what column it is operating on.

But yes, I'm very sure we should not encode the information contained in the ucd and unit attributes in the annotation again wherever we can help it. Repeating things will lead to conflicting information on both ends (don't get me with my GloTS operator hat on started on the endless pain of conflicting information in VOSI tables and TAP schema in TAP services) and hence make our clients' lives hard.

Container formats that don't have UCD or even unit will need some special handling in their annotation schemes, but that's certainly solvable relatively straightforwardly once we have the container format-specific referencing worked out, and we shouldn't encumber the modelling and mapping question with such considerations at this point -- this kind of thing has held up the whole effort for far too long already.

mcdittmar commented 3 years ago

On Fri, Feb 26, 2021 at 3:36 AM msdemlei notifications@github.com wrote:

On Thu, Feb 25, 2021 at 08:53:22AM -0800, Mark Cresitello-Dittmar wrote:

Help me see how this approach works well/better for clients.. Your implicit model for Cube is:

  • ndcube:Cube o independent_axes: RealQuantity[*][*] <-- [naxes][nrows] o dependent_axes: RealQuantity[*][*] <-- [naxes][nrows]

Since the things with the square brackets unnerve me a bit, let me make a brief statement: The annotation proposed is for VOTable (which for STC is the most pressing use case).

If we want to annotate FITS arrays or even CDF files or anything else, we'll have to come up with some syntax on how to reference the various things happening in these container formats, including header cards or other native metadata items.

However, the mess is bad enough as is, so I'd suggest to keep in mind we'll have to do that at some point and develop ideas how that could be done but focus on VOTable so we get something to work properly. Big problems need to be solved in reasonable steps.

I can't tell if you are saying that we don't need anything more complex than RealQuantity[naxes][nrows], or if you're suggesting this representation is in some way targeting more complicated FITS, CDF cases.. Annotation:

"independent_axes" is a pointing to the obs_time column .. ie a list of RealQuantity-s, length nrows There can be > 1 "independent_axes" (its plural).. so that is another dimension for # of axes Hence, independent_axes == RealQuantity[naxes][nrows]

Which is essentially "Table + knowledge of dependent/independent axes"

There are 2 Cubes defined

  1. has independent_axes = Field('obs_time'); dependent_axes = Field('phot')
  2. has independent_axes = Field('obs_time'); dependent_axes = Field('flux'), Field('phot')

I'd not put it this way -- as the annotation says, there's one Cube with two observables.

Yes, but with very little more information than you get with the TABLE of FIELDs. literally, the only information added is the dependency flag.

I, as a client, find Cube instance no. 1 and want to know 'what is this a cube of?'

This is an example of why I think it's paramount to have concrete use cases as scenarios what clients are supposed to do with the annotation.

You see, I don't think a client will generally enumerate cubes in this way. Instead, I expect it will see: "Ah, there's a cube in there; offer the user the option to plot this as a cube" (or, in a library: "I'll expose: you have two observables; before proceeding further, choose one").

Write a script which scans through data for target "X", find Cubes which have TimeSeries with Magnitude-s.

To find this, I need to extract other annotations which contain the same Field reference

  • 'obs_time' is in "stc2:Coords" (indirectly) and "stc2:TimeCoordinate.location" (directly)
  • 'phot' is in "phot:PhotCal.value" (directly) and cube no. 2 "ndcube:dependent_axes" (directly)

Which leads me to conclude that this is either:

  1. Cube( Time, PhotCal )
  2. Cube ( Time, Cube )

Ummm -- I don't think I could follow here, perhaps because I've not quite worked out why you see two Cubes in here. Could you perhaps elaborate a bit why a client would want to do this kind of analysis?

There is 1 cube at line 86, and another at line 155. I assumed that was intentional, but did wonder why you'd want to do that.

The only way I see to identify column 'phot' as a Magnitude from the annotation, is from the PhotCal.magnitudeSystem attribute having a value. That makes it a VERY important attribute!

Yes, of course -- value is all-important, because that says what the column you are annotating. Without it, the whole annotation is pointless in this scheme. True, this means you will may have to repeat items like filterIdentifier if you have multiple columns using the same photometric system -- but that seems a small price to pay for saving on referencing.

"saving on referencing".. This isn't something I'm terribly savvy about, but sounds like it may be an important criteria for the requirements on the annotation. You're saying that resolving the or ref=* references is more costly than the added bulk and downstream vulnerability to inconsistency? In my experience, once the knowledge that there is 1 instance being shared is lost, the various occurrences tend to be treated independently.

msdemlei commented 3 years ago

On Fri, Feb 26, 2021 at 06:54:13AM -0800, Mark Cresitello-Dittmar wrote:

On Fri, Feb 26, 2021 at 3:36 AM msdemlei notifications@github.com wrote:

On Thu, Feb 25, 2021 at 08:53:22AM -0800, Mark Cresitello-Dittmar wrote:

Help me see how this approach works well/better for clients.. Your implicit model for Cube is:

  • ndcube:Cube o independent_axes: RealQuantity[*][*] <-- [naxes][nrows] o dependent_axes: RealQuantity[*][*] <-- [naxes][nrows]

Since the things with the square brackets unnerve me a bit, let me make a brief statement: The annotation proposed is for VOTable (which for STC is the most pressing use case).

I can't tell if you are saying that we don't need anything more complex than RealQuantity[naxes][nrows], or if you're suggesting this representation is in some way targeting more complicated FITS, CDF cases..

I was trying to say that at this point -- when we don't talk about arrays in VOTable cells -- for referencing we don't need anything but columns and parames, i.e. XML ids. Which I now think had nothing to do with what you were saying: you were trying to give types to independent_axes and dependent_axes types other than "set of column references", right?

I don't think we'd be doing anyone, including the clients, a favour if we said anything about the types of the columns we reference here; the types are given by the container format. All the annotation says is, if you will, "consider this column as something you ought to plot on the abscisssa".

I can perfectly see people having categorial variables as axes, and I see not much reason to keep them from doing that. Of course a given analysis might not be able to deal with strings instead of numbers, but it's better to let it jump and fail than try to ward these things off in advance. After all, this isn't much different from having to deal with, say, negative numbers, which, to mention an example, will kill any analyses involving logarithms.

Trying to catch things like that in advance involves a lot of complexity and rigidity for no gain -- the result in either case would be an error message, and there's no guarantees anything up-front would be better understandable to users than what the programmes produce when they've jumped and failed.

You see, I don't think a client will generally enumerate cubes in this way. Instead, I expect it will see: "Ah, there's a cube in there; offer the user the option to plot this as a cube" (or, in a library: "I'll expose: you have two observables; before proceeding further, choose one").

Write a script which scans through data for target "X", find Cubes which have TimeSeries with Magnitude-s.

Well, I'd frankly expect such discovery operations to be run on Obscore-like tables, but let me see how I'd do that if I had to do that exercise on a wild bunch of annotated data files:

(1) find the ds:Dataset annotation (2) dereference the dataProductType attribute, and if it's a literal, compare it against TIMESERIES (3) find ds:Dataset's target location attribute (not in the current annotation; I don't remember why I didn't put it in. I'll do some AstroTarget-like thing as I find a bit of time). (4) That probably will never be a literal, so you'll get a bunch of params or columns. If it's columns, give up, if it's params, look for a spatial annotation on one of the params that you understand. That sounds more complicated than it is -- it just makes sure that we can evolve our spatial annotation withough breaking everything, and most of the time you'll have it it on the first attempt. If you don't find a spatial annotation you understand, give up. (5) compare the target position you find against target X's. (6) see if there's a column with a UCD of phot.* is in the columns. I'd say the likelihood for a false positive here is negligible.

But again: Iterating over a bunch of files is probably not how we'll do dataset discovery in any desirable future -- I suppose obscore, caom or similar tech will remain the norm for that.

There is 1 cube at line 86, and another at line 155. I assumed that was intentional, but did wonder why you'd want to do that.

Oops, no, that's a bug. Sorry about that. It is rather experimental code that's been hacked on mostly in bad moods a couple of times since the utype days... I'll fix it.

"saving on referencing"..

This isn't something I'm terribly savvy about, but sounds like it may be an important criteria for the requirements on the annotation. You're saying that resolving the or ref=* references is more costly than the added bulk and downstream vulnerability to inconsistency? In my experience, once the knowledge that there is 1 instance being shared is lost, the various occurrences tend to be treated independently.

This is exactly analogous to the question of de-normalising database schemas. There are very good reasons to keep them normalised in general, but there are equally good reasons to de-normalise them in individual cases.

For the photometry, my instinct would be that having the metadata in an immediate annotation is helping much more in the wide majority of the use cases than it might harm in the relatively few cases where having explicit filter objects referenced from photometry instances might help a bit.

But of course that's based on data I've dealt with. There might be cases when explicit filter objects actually help a lot, and that could change my considerations.

mcdittmar commented 3 years ago

The nesting is getting deep, so I'll re-base these comments

1) implicit model

you were trying to give types to independent_axes and dependent_axes types other than "set of column references", right?

Yes.. I'm looking to work out the mapping from the underlying model to the annotation... you don't have a document to describe that do you? The annotation is that the ATTRIBUTE independent_axes == list of COLUMN elements which reference a VOTable column. So what would the underlying Data Model which this annotates be?

2) Write a script which scans through data for target "X", find Cubes which have TimeSeries with Magnitude-s.

(1) find the ds:Dataset annotation (2) dereference the dataProductType attribute, and if it's a literal, compare it against TIMESERIES (3) find ds:Dataset's target location attribute (not in the current annotation; I don't remember why I didn't put it in. I'll do some AstroTarget-like thing as I find a bit of time). (4) That probably will never be a literal, so you'll get a bunch of params or columns. If it's columns, give up, if it's params, look for a spatial annotation on one of the params that you understand. That sounds more complicated than it is -- it just makes sure that we can evolve our spatial annotation withough breaking everything, and most of the time you'll have it it on the first attempt. If you don't find a spatial annotation you understand, give up. (5) compare the target position you find against target X's.

> (6) see if there's a column with a UCD of phot. is in the columns.*> I'd say the likelihood for a false positive here is negligible.

This speaks to the Annotation/Model requirements. To fully execute the thread, you must rely directly on the VOTable content (ucd-s) to identify the 'type' of the data (Magnitude). The main question here is: "Is the Magnitude-ness part of the model?" I believe it is. In which case, I should be able to identify it via Annotation content.

I think this question may be better explored in the "Standard Properties" case

On Mon, Mar 1, 2021 at 4:37 AM msdemlei notifications@github.com wrote:

On Fri, Feb 26, 2021 at 06:54:13AM -0800, Mark Cresitello-Dittmar wrote:

On Fri, Feb 26, 2021 at 3:36 AM msdemlei notifications@github.com wrote:

On Thu, Feb 25, 2021 at 08:53:22AM -0800, Mark Cresitello-Dittmar wrote:

Help me see how this approach works well/better for clients.. Your implicit model for Cube is:

  • ndcube:Cube o independent_axes: RealQuantity[*][*] <-- [naxes][nrows] o dependent_axes: RealQuantity[*][*] <-- [naxes][nrows]

Since the things with the square brackets unnerve me a bit, let me make a brief statement: The annotation proposed is for VOTable (which for STC is the most pressing use case).

I can't tell if you are saying that we don't need anything more complex than RealQuantity[naxes][nrows], or if you're suggesting this representation is in some way targeting more complicated FITS, CDF cases..

I was trying to say that at this point -- when we don't talk about arrays in VOTable cells -- for referencing we don't need anything but columns and parames, i.e. XML ids. Which I now think had nothing to do with what you were saying: you were trying to give types to independent_axes and dependent_axes types other than "set of column references", right?

I don't think we'd be doing anyone, including the clients, a favour if we said anything about the types of the columns we reference here; the types are given by the container format. All the annotation says is, if you will, "consider this column as something you ought to plot on the abscisssa".

I can perfectly see people having categorial variables as axes, and I see not much reason to keep them from doing that. Of course a given analysis might not be able to deal with strings instead of numbers, but it's better to let it jump and fail than try to ward these things off in advance. After all, this isn't much different from having to deal with, say, negative numbers, which, to mention an example, will kill any analyses involving logarithms.

Trying to catch things like that in advance involves a lot of complexity and rigidity for no gain -- the result in either case would be an error message, and there's no guarantees anything up-front would be better understandable to users than what the programmes produce when they've jumped and failed.

You see, I don't think a client will generally enumerate cubes in this way. Instead, I expect it will see: "Ah, there's a cube in there; offer the user the option to plot this as a cube" (or, in a library: "I'll expose: you have two observables; before proceeding further, choose one").

Write a script which scans through data for target "X", find Cubes which have TimeSeries with Magnitude-s.

Well, I'd frankly expect such discovery operations to be run on Obscore-like tables, but let me see how I'd do that if I had to do that exercise on a wild bunch of annotated data files:

(1) find the ds:Dataset annotation (2) dereference the dataProductType attribute, and if it's a literal, compare it against TIMESERIES (3) find ds:Dataset's target location attribute (not in the current annotation; I don't remember why I didn't put it in. I'll do some AstroTarget-like thing as I find a bit of time). (4) That probably will never be a literal, so you'll get a bunch of params or columns. If it's columns, give up, if it's params, look for a spatial annotation on one of the params that you understand. That sounds more complicated than it is -- it just makes sure that we can evolve our spatial annotation withough breaking everything, and most of the time you'll have it it on the first attempt. If you don't find a spatial annotation you understand, give up. (5) compare the target position you find against target X's. (6) see if there's a column with a UCD of phot.* is in the columns. I'd say the likelihood for a false positive here is negligible.

But again: Iterating over a bunch of files is probably not how we'll do dataset discovery in any desirable future -- I suppose obscore, caom or similar tech will remain the norm for that.

There is 1 cube at line 86, and another at line 155. I assumed that was intentional, but did wonder why you'd want to do that.

Oops, no, that's a bug. Sorry about that. It is rather experimental code that's been hacked on mostly in bad moods a couple of times since the utype days... I'll fix it.

"saving on referencing"..

This isn't something I'm terribly savvy about, but sounds like it may be an important criteria for the requirements on the annotation. You're saying that resolving the or ref=* references is more costly than the added bulk and downstream vulnerability to inconsistency? In my experience, once the knowledge that there is 1 instance being shared is lost, the various occurrences tend to be treated independently.

This is exactly analogous to the question of de-normalising database schemas. There are very good reasons to keep them normalised in general, but there are equally good reasons to de-normalise them in individual cases.

For the photometry, my instinct would be that having the metadata in an immediate annotation is helping much more in the wide majority of the use cases than it might harm in the relatively few cases where having explicit filter objects referenced from photometry instances might help a bit.

But of course that's based on data I've dealt with. There might be cases when explicit filter objects actually help a lot, and that could change my considerations.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/pull/2#issuecomment-787807366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMLJCSU2YT6DSHNJEXTVI3TBNN43ANCNFSM4YECHYAQ .

msdemlei commented 3 years ago

On Mon, Mar 01, 2021 at 07:48:59AM -0800, Mark Cresitello-Dittmar wrote:

The nesting is getting deep, so I'll re-base these comments

1) implicit model

you were trying to give types to independent_axes and dependent_axes types other than "set of column references", right?

Yes.. I'm looking to work out the mapping from the underlying model to the annotation... you don't have a document to describe that do you?

Sorry, no -- I could write something up, but the way I'm expecting this to be really just is "set of objects", so this particular thing would be really short.

The annotation is that the ATTRIBUTE independent_axes == list of COLUMN elements which reference a VOTable column. So what would the underlying Data Model which this annotates be?

  • these columns essentially describe Quantity-s (numeric values with units)

That's exactly the sort of cross-model references I'd like to avoid whenever it doesn't hurt much.

Hence, it's really just "set of objects", which in a VOTable annotation would usually translate into "set of columns and params". Whether it makes any sense to reference params in the axes attributes is another question -- probably not, but I suspect we'll be grateful one day if we don't rule it out (PARAMs can be array-valued, after all).

When you say "I can perfectly see people having categorial variables as axes"

  • this translates, in my head, to there being either different kinds of 'axes', or different kinds of values (stored in columns).
    • which starts adding structure to the model I don't think it is viable/useful for the model to be:
  • independent_axes: void[][] which is what I believe you are proposing.

No, it's less than that, because it's really just a set (where I'd not worry about implicit order too much, so a list would do, too).

I'm really sure we ought to leave typing to the container format (or the target object, if we can't avoid referencing complex things). Without that, bad flag days are really hard to avoid -- plus I don't believe static typing is going to help us anyway in this particular metamodel.

Let's learn from python (where this kind of thing is known as "dynamic typing").

> (6) see if there's a column with a UCD of phot. is in the columns.*> I'd say the likelihood for a false positive here is negligible.

This speaks to the Annotation/Model requirements. To fully execute the thread, you must rely directly on the VOTable content (ucd-s) to identify the 'type' of the data (Magnitude). The main question here is: "Is the Magnitude-ness part of the model?" I believe it is. In which case, I should be able to identify it via Annotation content.

You're probably expecting this, but anyway: What functionality would be enabled if you have "physical meaning of scalar" in the model?

You see, there's a rather high price tag on that (you'll have to replicate the UCD semantics, and you're severely limiting what people can annotate; we had indications of the trouble of pulling this kind of semantics into the models in the meeting the other day), and hence we should reap a proportional benefit from it.

Also consider that the Registry already uses UCDs for data discovery (if folks to these advanced sorts of data discovery at all). Building something that will eventually require a parallel structure with a similar functionality is painful, just as with the current situation in image search, where you have to run ObsCore, SIAP1, and SIAP2 for completeness. Let's try hard to avoid that in resource discovery (which is hard enough as-is).

I think this question may be better explored in the "Standard Properties" case

If you open a discussion there, would you mention ("@msdemlei") me there so github pings me?