ivoa / dm-usecases

The is repo gathers all the material to be used in the DM workshop 2020
The Unlicense
1 stars 3 forks source link

What is that dependant axes #12

Open lmichel opened 3 years ago

lmichel commented 3 years ago

If I understand well your serialisation, you map a list of NDPoint, each one being composed with

I don not see how a client can see that the 1st dependant value is a magnitude and a second a flux.

This question is related to the discussion we have been having here

mcdittmar commented 3 years ago

This is part of the "Unmodeled" Measure type discussion. Since there is no formal model containing Flux or Magnitude as a Measure (or any other type), then it MUST be handled by GenericMeasure.

From that perspective:

  1. there is no way to determine that one is Flux and one is Mag (other than the units)
    I'd say this may indicate a weakness in the Measurement model: should every Measurement instance be able to identify what physical entity it represents? either by class type, or semantic?
  2. there is no way of conveying that there "should be a corresponding PhotCal instance associated with this Measure"
    so, even if the Measurement provided its identity, it could not convey the dependency on other info.

With a formal model (what we did in Spectral)

  1. would create a Measure type for it
  2. define a Frame which include reference to the appropriate PhotCal instance.

At the Property Level in Mango

  1. you add a 'ucd' to the mix, which at least lets you identify it as a Flux or Magnitude
  2. but still have no way of conveying that there "should be a PhotCal" and which one.
  3. one COULD use the Property.associatedPropterty mechanism to make that connection, but it would be an abuse of the link.
lmichel commented 3 years ago

there is no way to determine that one is Flux and one is Mag (other than the units) I'd say this may indicate a weakness in the Measurement model: should every Measurement instance be able to identify what physical entity it represents? either by class type, or semantic?

This what Mango does actually. I believe that meas should at least support some sort of photometric data (as well as LonLat position) with a filter definition (PhotCal) somewhere in coords

At the Property Level in Mango

PhotCal is a component of PhotometricCoordSystem named PhotFilter

mcdittmar commented 3 years ago

there is no way to determine that one is Flux and one is Mag (other than the units) I'd say this may indicate a weakness in the Measurement model: should every Measurement instance be able to identify what physical entity it represents? either by class type, or semantic?

This what Mango does actually.

Right, the question is should Mango do it? or is it the responsibility of the Measure? ie: should users be able to determine the specific 'kind' of Measure from the Measure itself, whether in Mango or Cube or TimeSeries or etc.

My feeling at the moment, is that the user should be able to poll the Measure and identify what it is so that decisions can be made. That 'poll' may be a check on the class type (easy) or something else for GenericMeasure

I believe that meas should at least support some sort of photometric data (as well as LonLat position) with a filter definition (PhotCal) somewhere in coords

At the Property Level in Mango

PhotCal is a component of PhotometricCoordSystem named PhotFilter

That becomes tricky.. Meas having some Photometry type is not a problem. Coords having a PhotometryFrame which relates the photDM:PhotCal object would add an awkward dependency since "Coords" is more core than "PhotoDM". This is a case where Markus' modeling plan would be handy (not that I'm advocating it)

Generally the idea has been to define things in the model which covers the domain. So Photometry measure would be defined in the Spectral model, or perhaps photDM itself.

lmichel commented 3 years ago

Right, the question is should Mango do it? or is it the responsibility of the Measure?

In any case it is not the responsablity of Measure which models measure classes but not their roles in a given context This is the responsability of CUBE in the case: It has to assign a role to the components it uses.

My feeling at the moment, is that the user should be able to poll the Measure and identify what it is so that decisions can be made. That 'poll' may be a check on the class type (easy) or something else for GenericMeasure

François tried to work with a model derived from CUBE but using mango:Parameter instead for meas:measure This was an elegant way to go through the issue.

lmichel commented 3 years ago

Coords having a PhotometryFrame which relates the photDM:PhotCal object would add an awkward dependency since "Coords" is more core than "PhotoDM".

This question is related to the approriate level of dependencies in a system

Now a comment about the model import.

This has 2 consequences

  1. I do not follow @msdemlei when he that says that the evolution of a component model will break the stack. If model1V1 imports model2V1 and model2V1 is updated to model2V2, then model1V1 remained unchanged until it is upgraded to support model2V2
  2. As the imported model is frozen, why should we continue to work with dynamic links? As models are well defined and versionned, I'm wondering whether we could consider working with class copies. This class of my model model1V1 is a copy the its sibling in model2V1. This wouldn't break the consistancy I mentionned above while making our job easier.

In the case of Coord it should be easy the import vs PhotDM components that way. If this proposal looks too odd, using both Coords and PhotDM in measures will require to work with another model aggregating them.

mcdittmar commented 3 years ago

Right, the question is should Mango do it? or is it the responsibility of the Measure?

In any case it is not the responsablity of Measure which models measure classes but not their roles in a given context

I agree with this. In Mango, the 'role' is provided by Parameter.semantic.. right?

But Parameter.ucd identifies the Type of the contained measure (as a UCD) ("pos", "time", "phot.flux", "phys.mass") And THIS is probably the responsibility of Measure to provide this info either by the class name, or by some other means in the case of GenericMeasure.

François tried to work with a model derived from CUBE but using mango:Parameter instead for meas:measure This was an elegant way to go through the issue.

The structure "Source -> Parameter -> Measure" is very similar to the cube "Cube -> Observable -> Measure" structure. This same issue will effect Cube, so its good to has that out here and decide where that solution belongs.

mcdittmar commented 3 years ago
  • This is a nice feature of VODML but difficult to use in practice.

    • the proxy class trick you proposed for Modelio works fine
    • But it is not very safe because you have to cut/past VODML ids and class names from the imported models to your modelio: big risk of mistake (believe my experience)!

I agree that better VODML modeling tools would be very useful! I'd love to see a UML utility that could generate the diagrams, XML and PDF; which, I think, Paul Harrison had started at one point.

  1. As the imported model is frozen, why should we continue to work with dynamic links? As models are well defined and versionned, I'm wondering whether we could consider working with class copies. This class of my model model1V1 is a copy the its sibling in model2V1. This wouldn't break the consistancy I mentionned above while making our job easier.

In my experience from resolving/extracting the Dataset metadata content from Characterization, Spectrum, ObsCore models, this leads to a LOT of inconsistencies and maintenance issues. The 'copy' is rarely a true mirror.

I'm not sure it was your goal with this element, but even in Mango, the PhotFilter object is maybe compatible with, but not a copy of the photDM.PhotCal object. And, in Mango, it is an extension of coords:CoordFrame, which it is not in photDM.

lmichel commented 3 years ago

In Mango, the 'role' is provided by Parameter.semantic.. right?

Parameter.semantic comes in addition to Parameter.ucd I would say that Measure is passive, it provides components for who do request it. It is not responsible of the usage of the provided elements. This is the responsability of the host model. In case of MANGO there is no safety guard preventing misusing measures.

lmichel commented 3 years ago

The structure "Source -> Parameter -> Measure" is very similar to the cube "Cube -> Observable -> Measure" structure. This same issue will effect Cube, so its good to has that out here and decide where that solution belongs.

The main difference is the UCD use.

lmichel commented 3 years ago

In my experience from resolving/extracting the Dataset metadata content from Characterization, Spectrum, ObsCore models, this leads to a LOT of inconsistencies and maintenance issues. The 'copy' is rarely a true mirror.

True while you are doing this by hand. If now you have a system that is able copy a VODML class from a file to another thng would be more seamless

Mango:PhotFilter is similar to PhotSys@VOTable. We did so until PhotDM is VODMLized. @loumir already complained about this and proposed a PhotDM clone more consistant

mcdittmar commented 3 years ago

On Wed, Mar 10, 2021 at 10:07 AM Laurent MICHEL notifications@github.com wrote:

This has 2 consequences

  1. I do not follow @msdemlei https://github.com/msdemlei when he that says the evolution of a component model will break the stack. If model1V1 imports model2V1 and model2V1 is updated to model2V2, then model1V1 remained unchanged until it is upgraded to support model2V2

This is migrating off topic of the issue.. but.

Markus' point on this is quite valid though. Let's say that the Spectral model work, or something around PlanetaryScience (Orbits), requires a major version change to Meas/Coords.

Technically, I could probably rig the annotation:

Where I differ with Markus is that I think this is an annotation problem, not a model problem.

mcdittmar commented 3 years ago

On Wed, Mar 10, 2021 at 11:14 AM Laurent MICHEL notifications@github.com wrote:

In Mango, the 'role' is provided by Parameter.semantic.. right?

Parameter.semantic comes in addition to Parameter.ucd I would say that Measure is passive, it provides components for who do request it. It is not responsible of the usage of the provided elements. This is the responsability of the host model.

In case of MANGO there is no safety guard preventing misusing measures.

Last statement on this for now..

IF the purpose of Parameter.ucd is to identify the Type of Measure the Parameter holds when that information is not available from the Measure class itself, then I'd say that this job should be pushed into the Measure, thereby removing the problem of Parameter.ucd being inconsistent with the actual Measure.

IF it serves another purpose, then there may be reason to keep it at the Mango level.

So far, I don't see another purpose, and the description in the Mango document says "UCD1+ giving the type of the physical measure"

lmichel commented 3 years ago

UCD tells more that measure type. UCDs are 2 words label e.g. pos;meta.main Therefore you cannot put UCDs in measures as built-in parameters.

I've no trouble with the risk of UCD/Class mismatch. It looks reasonable to me because we have a model that must be applicable for a very broad set of use-cases, past, present or future. This implies to introduce somewhere a very flexible feature (flexible seal?) connecting real life data with model elements.

pahjbo commented 3 years ago

On 2021-03 -10, at 16:09, Mark Cresitello-Dittmar @.***> wrote:

I'd love to see a UML utility that could generate the diagrams, XML and PDF; which, I think, Paul Harrison had started at one point.

I did - it is here https://github.com/pahjbo/vodsl https://github.com/pahjbo/vodsl and I think that it is better for what you are trying to do with sharing and refactoring models, mainly because the “source code” that you are working with is simple text (easy to compare/version control etc.). However, I gave up maintaining it, for lack of any interest in using it (and it will not work in the latest eclipse).

Although you lose a lot of the cleverness by not using eclipse, it is possible to just edit the files in your favourite text editor and then use the https://github.com/pahjbo/vodsl#using-the-stand-alone-parser https://github.com/pahjbo/vodsl#using-the-stand-alone-parser parser to convert back to VODML at the end (someone has actually done this).

If there was some interest, then there is a route towards making it work in modern javascript IDEs such as visual studio code - I am not sure that I would have time to do that, but I could point someone in the right direction.

Cheers, Paul.

lmichel commented 3 years ago

Markus' point on this is quite valid though.

Not really because the main issue is not the propagation of the meas/coord upgrades, it is the nature of the changes.

If the new meas/coord keeps the ascending compatibity, datasets annotated with different versions remain interoperable, otherwise they don't. That is the issue. In the first case, updating models using meas/coord is straighforward . We could even imagine a sort of errata process on VODML files. In the seconda case, we can get great damages, entangled models or not.

If you limit the annotation to meas/coord , you loose the possibility to connect elements each to other.

msdemlei commented 3 years ago

On Wed, Mar 10, 2021 at 09:08:07AM -0800, Laurent MICHEL wrote:

UCD tells more that measure type. UCDs are 2 words label e.g. pos;meta.main Therefore you cannot put UCDs in measures as a built-in parameter.

I don't think I follow this "therefore" -- it's still a string, no? Of course, I'd still not actually build UCDs into the models, as it's already in VOTable, so we don't need data models for this kind of thing.

What we do need models for is defining frames, linking values and error, linking times and places, etc.

I've no trouble with the risk of UCD/Class mismatch. It looks reasonable to me because we have a model that must be applicable for a very broad set of use-cases, past, present or future. This implies to introduce somewhere a very flexible feature (flexible seal?) connecting real life data with model elements.

Well, the mismatch isn't the only worrying thing; for me, it's more that we build something for which we already have a solution, or at least very nearly so. I'd still like to see what exactly you can do when you have your per-physics classes on top that you cannot do when you just have the UCD.

This whole thing would be different if you proposed to get rid of the UCDs (and there would be arguments in favour of that, though far less than against) once we have your DMs.

But as long as we keep the UCDs I'd be very reluctant to build something that's this closely related to them.

mcdittmar commented 3 years ago

On Wed, Mar 10, 2021 at 12:08 PM Laurent MICHEL notifications@github.com wrote:

UCD tells more that measure type. UCDs are 2 words label e.g. pos;meta.main Therefore you cannot put UCDs in measures as a built-in parameter.

I know it sounded like it, but I wasn't necessarily advocating that UCD should move into Measure, but that all Measure classes should (maybe) be responsible for identifying what physical quantity it represents. This may not involve UCD.

Tying in with Markus' comments as well: The UCD was developed early on to tag VOTable elements with some sort of physical meaning. The words server multiple purposes, and overlap with the model class ("pos", "phot.flux", "phys.mass"), and role ("phys.angSize.smajAxis", "obs.exposure"). They are very useful and used in VOTable serializations.

But I don't think they should be used in the Models (or at least not without qualifiers so that it ONLY identifies the type).

My impression is that using UCD here is taking a model requirement

mcdittmar commented 3 years ago

Markus,

I'd still not actually build UCDs into the models, as it's already in VOTable

I'd still like to see what exactly you can do when you have your per-physics classes on top that you cannot do when you just have the UCD.

I feel like these statements answer your own question:

Additionally: the Position is complex, and the Annotation allows you to identify which 'roles' are filled by which VOTable elements. Which FIELD is the 'latitude', which 'longitude', which define the error ellipse. Again, regardless of whether or not the VOTable groups these elements or populates the ucd tag on the PARAM|FIELD.

If you want to use UCDs in the Annotation, that is a different discussion, but you are still mapping the per-physics classes to particular UCDs .

If you're thinking we don't need to model Position, we just need to model Measure and use UCDs for the physics; (which I think is exactly what you've said), I assert you have the same problem

msdemlei commented 3 years ago

On Wed, Mar 10, 2021 at 09:35:58AM -0800, Laurent MICHEL wrote:

In the first case, updating models using meas/coord is straighforward . We could even imagine a sort of errata process on VODML files. In the seconda case, we can get great damages, entangled models or not.

No, that is my point: if you don't entagle models, the "damage" is limited to the model you're updating (i.e., "not great damage). With entangled models, you're taking down the entire annotation when one model changes (i.e., great damage).

So, if you will: avoid entangled models to limit the damage radius of incompatible updates.

If you limit the annotation to meas/coord , you loose the possibility to connect elements each to other.

Again, I'd contradict here: The connection(s) are what the model should do and what goes beyond conventional VOTable annotation.

But making this point in abstract perhaps is not terribly convincing, so: What kind of connections are you thinking of in the use cases we have?

mcdittmar commented 3 years ago

Markus,

On Thu, Mar 11, 2021 at 3:50 AM msdemlei @.***> wrote:

If you limit the annotation to meas/coord , you loose the possibility to connect elements each to other.

Again, I'd contradict here: The connection(s) are what the model should do and what goes beyond conventional VOTable annotation.

But making this point in abstract perhaps is not terribly convincing, so: What kind of connections are you thinking of in the use cases we have?

I'm not sure what you mean by "what kind of connections".. basically the philosophy that the models are 'building blocks" so that content is not duplicated (reusable).

When considering that approach, I always get the impression of:

A very significant obstacle to this approach is that there is no 'black box' object in the base types (and IMO there shouldn't be.. but that isn't the point). Each attribute of the model MUST have a Type, and so, to disentangle the models, you need an AnyType sort of thing meas:Measure

This just doesn't make sense to me. The values cannot be AnyType and still facilitate interoperability.

Laurent has an implementation of each of the cases (those with data anyway). I have done several (working Standard Properties now). For mine, I try to 'do something' with the data which illustrates the usage, generally pulled from the case description.

I've been looking forward to seeing your implementations on these to compare and see how you envision this working in the larger scale.

Mark

msdemlei commented 3 years ago

On Thu, Mar 11, 2021 at 07:10:04AM -0800, Mark Cresitello-Dittmar wrote:

When considering that approach, I always get the impression of:

  • toss out a bag of Lego-s and call it a "Death Star".. 'you just have to put the pieces in the right order.'

This metaphor I think is very useful -- it has made me feel like I've understand this discussion better, at least. You know, I think it is how we should present the question to the wider (VO) public.

This is easy for me to say because I'm convinced that if you asked a bunch of programmers if they'd rather have a bunch of Legos or a pre-assembled Death Star, nine out of ten would go for the Legos (well, perhaps except if it was a real, working Death Star, but let's rather not consider this possibility).

And there's a good reason for that: In actual implementation, "Do Time Series" isn't a use case. "Plot error bars" or "transform coordinates" is. Having large, pre-assembled structures makes for clumsy programmes, and our attempt to produce these large structures perhaps is part of the reason why our DM efforts so far have made very little inroads to any sort of running code.

Each attribute of the model MUST have a Type, and so, to disentangle the

The values of course have types -- ususally, the container format provides them, and the INSTANCE-s have their dmtype, too.

For attributes, on the other hand, having types is a lot less important, as evinced by the success of Python (that is strictly typed on the value side but untyped for the attributes by default). Now, there's cases for providing guarantees on the properties of attributes as well, in within models, giving such guarantees by default probably helps implementors while not damaging much.

Across models or into VOTables, however, type annotation on attributes should be limited to where there's a strong operational reason to guarantee types.

You see, you will want to change the types of the target objects, and most of the time the clients would still do the right thing after the change, for instance, because a VOTable library abstracts away the modification, or because time has passed and they can just deal with things.

If you blindly fix the expected types of Attributes ("static typing"), you'll have a lot of breakage in model evolution where nothing bad would have happened without the static typing.

Cf. this with SCS's regulation that the VOTables returned MUST be version 1.1. This has been a sea of pain in implementation without buying anything at all; actually, plenty of SCS services just ignore the regulation and work fine with all existing SCS clients.

Of course, you'll need to find a balance there; it certainly was right for SCS to require that a VOTable be returned, and probably even that it's to be a VOTable 1. Finding this balance is only possible based on actual use cases -- which are not pieces of annotation but actual tasks like the ones I've mentioned in http://mail.ivoa.net/pipermail/dm/2020-September/006123.html

model meas:Position then "such as meas:Measure with ucd containing the primary atom "pos". cube:Observable

  • measure: ivoa:AnyType << "an physical quantity with associated errors, coordinate systems/frame/space.. such as meas:Measure"

This just doesn't make sense to me. The values cannot be AnyType and still facilitate interoperability.

Actually, having dynamic typing here is the only way we can interoperability on the long term, because both clients and servers can support a significant number of incompatible measure models without requiring to repeat all other annotations that perhaps will never evolve again.

I've been looking forward to seeing your implementations on these to compare and see how you envision this working in the larger scale.

I'm happy to (provisionally) annotate other kinds of data, but frankly, this, I think, isn't going to tell us anything we don't already know until we get the client/library authors on board, perhaps starting with a simple use case like "automatically plot error bars" -- or the (to me) central one "transform this catalogue to a different epoch".

lmichel commented 3 years ago

I'm happy to (provisionally)

My proposal come with a Python client that provides model instances as Python dictionnaries.

mcdittmar commented 3 years ago

Markus,

This is one of my favorite exchanges on this subject! I can't say I agree with you, but I think I am understanding your point-of-view better. Instead of:

On this part below.. 1) Sorry, I thought I heard you volunteer to implement the use case at the preview meeting. I'm not sure how I could resolve my concerns about your approach without seeing it in action. 2) In the Time Series cas https://github.com/ivoa/dm-usecases/tree/main/usecases/time-series/mcd-implementatione: I pull out the points and plot them.. with error bars. In the Native Frames case https://github.com/ivoa/dm-usecases/tree/main/usecases/native_frames/mcd-implementation: I take the input Positions (in ICRS and GALACTIC), transform them to (FK5 J2015.5) and plot them.

On Fri, Mar 12, 2021 at 2:29 AM msdemlei @.***> wrote:

I've been looking forward to seeing your implementations on these to compare and see how you envision this working in the larger scale.

I'm happy to (provisionally) annotate other kinds of data, but frankly, this, I think, isn't going to tell us anything we don't already know until we get the client/library authors on board, perhaps starting with a simple use case like "automatically plot error bars" -- or the (to me) central one "transform this catalogue to a different epoch".

lmichel commented 3 years ago

There are here 2 topics (at least) tahte are getting entangled.

lmichel commented 3 years ago

This threads comes in response to this post

Well, the mismatch isn't the only worrying thing; for me, it's more that we build something for which we already have a solution, or at least very nearly so. I'd still like to see what exactly you can do when you have your per-physics classes on top that you cannot do when you just have the UCD.

In conclusion, I'll say that an annotation scheme limited to simple cases is not really interesting. If we want to get all the benefits of the data annotation (a painfull process for the data providers), we have to build a full featured system.

msdemlei commented 3 years ago

On Fri, Mar 12, 2021 at 08:47:50AM -0800, Laurent MICHEL wrote:

This threads comes in response to this post

Well, the mismatch isn't the only worrying thing; for me, it's more that we build something for which we already have a solution, or at least very nearly so. I'd still like to see what exactly you can do when you have your per-physics classes on top that you cannot do when you just have the UCD.

  • The model does not do anything. It is just a piece of structured

Ok, let's say "the model enables certain things" -- if it didn't I frankly wouldn't see much point in going to the trouble of defining machine-readable models. So, I stand by my basic point: Everything we do here should be grounded in some actual use case, i.e., something a client can do with the model annotation that it couldn't do without it.

documentation that allows people to understand each to other when they talk about data content. In this context, having per-physics

Ummm... does "people" refer to actual humans? If so, I'd say no. Humans don't need machine-readable models. They're much better served by plain text and straight math.

  • If I understand well, your question relates more to the data annotation. The data annotation consists in inserting in data sets

In a way, yes, but again I'd say the only reason we're doing models is that clients can rely on a certain structure of the annotations. And while I'm usually all for divide-and-conquer when trying to solve complex problems, in this particular case I think the attempt to define models somehow detached from how they will be used hasn't served us well in the past 10 years.

  • If you have a very simple VOTables, the model mapping does not help at all ,you are right. Note that none forces you annotate your data.

I'd never say that annotation doesn't help in one place or another. On the contrary, even the simplest VOTable (id, ra, dec, say) is in dire need of annotation, because you need to define the frame, the epoch, and so on. The reason I'm here is that we still can't do that properly (though, admittedly, in this very simple case, COOSYS helps a bit).

  • If you have something a bit more tricky such as complex errors, the annotation make them understandable by any client. I hear you saying with good reason that clients can already do a very good job without model annotation. But this is not a reason for not helping them (tools and libs) with clean data interfaces.

No, not at all: Clients can't do without annotation even for the simplest errors, and that's why I still have to scroll a lot through combo boxes just to make TOPCAT plot the right error bars.

I'm just saying that we should tackle the simple things first, making those work, and then tackle more complex things as clients want to consume them. Let's not waste time on quarreling about complex error representations when clients can't even do the simple "plot an error bar". We will get it wrong if we do this without guidance by client authors.

  • In the higher level you may want to add structured data (e.g Provenance) in your VOTable. This can only be done an advanced annotation system.

Right. But Provenance is an excellent example: Do you really want to mingle provenance annotation with, say, dataset, or doesn't it make a lot more sense to have provenance next to (and independently of) all the other annotations we can have in a dataset?

In conclusion, I'll say that an annotation scheme limited to simple cases is not really interesting. If we want to get all the benefits of the data annotation (a painfull process for the data providers), we have to build a full featured system.

So, to make this concrete, I've created a fork of astropy that illustrates how I think you can work with arbitrarily complex data: https://github.com/msdemlei/astropy.

The README explains, I hope, the basic outlook, and the code that's given there should already work.

I'm happy to demonstrate complex use cases based on this if you throw them at me.

msdemlei commented 3 years ago

On Fri, Mar 12, 2021 at 06:41:19AM -0800, Mark Cresitello-Dittmar wrote:

This is one of my favorite exchanges on this subject! I can't say I agree with you, but I think I am understanding your point-of-view better. Instead of:

  • each model being a building block to be used/imported by other complex models You advocate:
  • model the building blocks to be used by clients to construct complex instances

Right -- and that's because I believe that will make our annotations work with how the programmes are already written.

I claim very few programmers will want to, say, generate code from our DMs and then organise their programme around that. I'm rather sure they'd much prefer to have an easy go at pulling out the two or three pieces of information they need for the task at hand and then work with these in whatever way they like.

This is what my proposal over at https://github.com/msdemlei/astropy tries to achieve (with relatively little code, I'd claim). I'd hope the readme illustrates that (although I'm well aware that we'll want to evolve the annotation -- see "overly minimal" -- and even if not, the code would need some robustness improvements).

On this part below.. 1) Sorry, I thought I heard you volunteer to implement the use case at the preview meeting. I'm not sure how I could resolve my concerns about your approach without seeing it in action.

Does the astropy prototype work for the seeing-in-action thing?

If you give me data and use cases (i.e., "do this or that with the data), I'd try to cover those as well.

2) In the Time Series cas

https://github.com/ivoa/dm-usecases/tree/main/usecases/time-series/mcd-implementatione: I pull out the points and plot them.. with error bars. In the Native Frames case https://github.com/ivoa/dm-usecases/tree/main/usecases/native_frames/mcd-implementation: I take the input Positions (in ICRS and GALACTIC), transform them to (FK5 J2015.5) and plot them.

Yeah... I'm not claiming these things won't work at all. I'm just claiming we're making it unnecessarily hard to do them when entangling data models, that we're making it unnecessarily hard for us if we don't just re-use what already works (UCDs, xtypes, ...), and we'll be regretting each entanglement of DMs the moment we need to evolve the DMs.

So, the question I'm trying to raise can perhaps succinctly put as: Is what we've come up with the simplest thing (in concept and implementation) we can come up with that still satisfies actual use cases?

msdemlei commented 3 years ago

On Fri, Mar 12, 2021 at 06:40:16AM -0800, Laurent MICHEL wrote:

  • The point is that the public API does no refer to any native data element but only to model elements.
  • This is the key point for interoperability.

Ummm... Can you explain why you think that? You see, I've tried to make the exact opposite point a couple of times, and perhaps I can do a better job on that if I understand why you'd like to avoid talking about the things you annotate.

lmichel commented 3 years ago

The scope of the annotations must go beyond simple column annotations which must remain supported though. I detailed it here section 2.

My point, is since we have a self-consistant model made with a hierarchy of elements identified with dmtype, dmrole and others things, the annotation must be something matching that structure.

Once you have it, you can use accessors based on those identifiers. That is what I call a public API does no refer to any native data element but only to model elements

I the examples I showed up is these use-cases, I transform annotations blocks in Pyhton dictionnaries that are easily serializable in JSON (a good point for data exchange).

In pseudo code, this would look like this:

annotation_reader = AnnotationReader(my_votable)
if annotation_reader.support("mango") is False:
  sys.exit(1)

mongo_instance = annotation_reader.get_first_row()
print(mongo_instance.get_measures())
['pos", "magField"]
print("Magnetic field is:" + mongo_instance.get_measure("magField"))
Magnetic field is: 1.23e-6T +/- 2.e-7

This wouldn't require Python classes implementing the model (fundamental point)

I claim that the annotation must be designed in a way that allows this in addition to basic usages.

Let's consider that all Vizier tables come with such annotations, the same API code could that get many things:

msdemlei commented 3 years ago

On Fri, Mar 19, 2021 at 07:23:56AM -0700, Laurent MICHEL wrote:

The scope of the annotations must go beyond simple column annotations which must remain supported though. I detailed it here section 2.

I'm starting to be unsure whether we are actually disagreeing on much here -- and I've not found anything in that section 2 that I'd need to contradict.

So, perhaps a clarification: is my time series use case "single column annotation", and if so, why? What actual usage would go beyond what's possible there?

My point, is since we have a self-consistant model made with a hierarchy of elements identified with dmtype, dmrole and others things, the annotation must be something matching that structure.

Well, the thing with dmrole and dmtype to me is the annotation, but I think what you're saying here is that the annotation should be directly derived from the model. That I wholeheartedly agree with, and that's why I'm so concerned about the current MCT proposal -- if it were some abstract musing, I'd be totally ok with it. But when the model defines the annotation structure. whatever we do in the model has concrete operational consequences. Which, mind you, is fine -- we'll have to deal with them somewhere and the DM is the right place for that.

Once you have it, you can use accessors based on those identifiers. That is what I call a public API does no refer to any native data element but only to model elements

...and I still cannot figure out why you want this -- after all, the point of the whole exercise IMNSHO is to add information to VOTables (and later perhaps other container formats) that is not previously in there.

What would the use case for your free-floating annotation be, if this is what your are proposing?

I the examples I showed up is these use-cases, I transform the annotation in Pyhton dictionnaries that are easily serializable in JSON (a good point for data exchange).

In pseudo code, this would look like this:

annotation_reader = AnnotationReader(my_votable)
if annotation_reader.support("mango") is False:
  sys.exit(1)

mongo_instance = annotation_reader.get_first_row()
print(mongo_instance.get_measures())
['pos", "magField"]
print("Magnetic field is:" + mongo_instance.get_measure("magField"))
Magnetic field is: 1.23e-6T +/- 2.e-7

This wouldn't require Python classes implementing the model (fundamental point)

I claim that the annotation must be designed in a way that allows this in addition to basic usages.

-- but why would you want to do this JSON serialisation? Wouldn't it be much better overall to just put that value into a VOTable and transmit that rather than fiddle around with custom JSON dictionaries? In particular when there are quite tangible benefits if you make it explicit in the model what exactly it is that you're annotating?

By the way, if by "wouldn't require Python classes" you mean "You don't have to map model classes into python classes" then yes, I agree, that is a very desirable part of anything we come up with. Let's avoid code generators and similar horrors as much as we can. Nobody likes those.

Let's consider that all Vizier tables come with such annotations, the same API code could that get many things:

  • Basic quantities (no significant gain I admit)
  • Complex quantities (e.g. complex errors)
  • Columns grouping
  • Status values
  • Associated data or services

I agree to all these use cases (except, as I said, even for basic quantities the gain is enormous because we can finally express frames, photometric systems, and the like in non-hackish ways).

But: which of these use cases would you miss with the non-entangled, explicit-reference models?

lmichel commented 3 years ago

discussion forked on #18

msdemlei commented 3 years ago

On Wed, Mar 10, 2021 at 12:21:24PM -0800, Mark Cresitello-Dittmar wrote:

  • the models do not have UCDs, so you define a Class for the concept (Position, Time)
    • The per-physics class tells you what to expect: the SphericalPosition should have a 'longitude' and 'latitude' and 'error' among other things. (illustrative, not exact)

Yeah, that's structural, and sure, you'll need classes for "scalar" vs. "polar coordinate" vs. "cartesian coordinate" (where for now I'd hope that's only necessary in coordinates for the time being).

But structurally, the scalar quantities all work the same way (there's a single float). There's nothing to be gained by introducing extra classes for "redshift scalar" versus "photometric scalar" for all I can see; all these scalars essentially work the same way.

Of course, a photometric scalar has different additional metadata (information on the photometric system) than a redshift scalar (that might is also be part of some spatial annotation). But again I cannot see how entangling this additional metadata into a particular class that essentially only does thing entanglement will help: A client looking for this will plausibly look directly for photometric system annotation rather than look for instances of "photometric scalar" and than hope it has photometric system annotation.

  • the VOTable serialization has UCDs:

    • so if you are evaluating the VOTable content and find a PARAM with ucd="pos" or "time" you can infer (by interpreting the semantic word), that the PARAM represents a Position or Time concept, but no specific content expectation can be formed.
  • The VOTable serialization, with Utype and ucd, was deemed insufficient for mapping content to models, so an Annotation scheme was requested and developed.

    • the Annotation relates the model class meas:Position to a VOTable PARAM.
      • NOTE: I know there is not a 1-1 match from Position to a VOTable PARAM, but this serves for illustration.
    • this identifyies the PARAM as a Position regardless of whether or not the PARAM includes a ucd="pos"
    • my understanding is that the Annotation should not depend on the underlying VOTable ucd or Utype
      • if a VOTable has no ucd or Utype assignments, you can fully identify the content from the Annotation.
        • I can distinguish Flux from Time without use of ucd or Utype.

Positions, being vectors usually, aren't a terribly good example to investigate for the question of whether we ought to have per-physics scalar classes. Let's keep it at scalars, so flux and time are good examples.

And there I'm convinced that just providing the annotations of, say, a time system or a photometric system as appropriate will be what clients want by the above reasoning.

What kind of usage do you have in mind where a client will stumble into a column and will want to tell whether it's a time or a flux and where it wouldn't be equally well served with basing that judgement on the UCD?

Conversely, saying "Ah, if people have been sloppy and haven't defined a UCD, clients can fall back on the DM annotation" is I think not very convincing: DM annotation is a lot more complex than just slapping on a UCD. I don't see any chance that data providers that don't manage to assign UCDs will get DM annotation remotely right.

Again, I'd perhaps be less concerned about this if we said: "Let's scap the UCDs, we'll do it all with models now" (though from my Registry perspective I'd wail and cry if someone proposed that). But as long as we don't do that, let's not try to address UCDs' use cases in models unless we're very sure UCDs aren't enough for what a client might want to do -- and I've not seen an indication for that yet.

Let me quote the Zen of python here:

$ python -c "import this" | grep obvious
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.

Additionally: the Position is complex, and the Annotation allows you to identify which 'roles' are filled by which VOTable elements.

Of course, totally agreed. But that doesn't require per-physics scalar classes.

If you want to use UCDs in the Annotation, that is a different discussion, but you are still mapping the per-physics classes to particular UCDs .

No, we definitely should avoid having UCDs in DM annotations (excepting perhaps a few oddities, perhaps in provenance or so). UCDs are in use already, and they are part of the container formats.

If you're thinking we don't need to model Position, we just need to model Measure and use UCDs for the physics; (which I think is

No, of course I'm not thinking that. My point is: models for the structure, UCDs for the physics.

* I don't think this gets you out of the model dependencies,
since you'd still want to model that Measure has a 'coord'
attribute which needs a Type which would be a
"coords:Coordinate" .

No -- whenever halfway feasible, the value of a Measure should be a reference to the annotated thing (FIELD, PARAM, TABLE, RESOURCE), where perhaps we may need to allow references to array elements. That is the crux of the matter that decides whether our scheme will blow up the first time we need an incompatible change to one of our DMs.

lmichel commented 3 years ago

Of course, a photometric scalar has different additional metadata

If you have multiple filters in your dataset it is easier to have each magnitude instance referencing its proper filter than having a set of filters and to let the client to do the filter/measure matching.

My point is: models for the structure, UCDs for the physics.

This is right, but nothing prevents a model to embed attributes carrying the physics of the modeled quantities. I would say that is even necessary if you want model instances to be self-consistant. I admit however that the way MANGO is doing this has to be improved, but it has to do it.

msdemlei commented 3 years ago

On Mon, Mar 22, 2021 at 09:25:31AM -0700, Laurent MICHEL wrote:

Of course, a photometric scalar has different additional metadata

If you have multiple filters in your dataset it is easier to have each magnitude instance referencing its proper filter than having a set of filters and to let the client to do the filter/measure matching.

I think I agree here, but perhaps you could point at examples for the two approaches you envision here?

My point is: models for the structure, UCDs for the physics.

This is right, but nothing prevents a model to embed attributes carrying the physics of the modeled quantities. I would say that is

Ah-hm... sorry, but "nothing prevents" is a weak reason to do something in a standard. I'll keep pleading that we to the minimum required to fulfill our use cases, as long VO experience shows that whatever extra bells and whistles we put into our standards later turn into problems (see caproles).

Now, perhaps there are strong use cases that require per-physics scalar classes, but I cannot see one yet, which may be because...

even necessary if you want model instances to be self-consistant. I admit however that the way MANGO is doing this has to be improved, but it has to do it.

...I still don't understand what you mean by self-consistent. Could you perhaps try again to explain what you mean by that (is it "we can serialise instances outside of container formats"?) and what use cases you'd like to enable by this self-consistency?

mcdittmar commented 3 years ago

On Wed, Mar 10, 2021 at 12:21:24PM -0800, Mark Cresitello-Dittmar wrote:

  • the models do not have UCDs, so you define a Class for the concept (Position, Time)
  • The per-physics class tells you what to expect: the SphericalPosition should have a 'longitude' and 'latitude' and 'error' among other things. (illustrative, not exact)

Yeah, that's structural, and sure, you'll need classes for "scalar" vs. "polar coordinate" vs. "cartesian coordinate" (where for now I'd hope that's only necessary in coordinates for the time being). But structurally, the scalar quantities all work the same way (there's a single float). There's nothing to be gained by introducing extra classes for "redshift scalar" versus "photometric scalar" for all I can see; all these scalars essentially work the same way. Of course, a photometric scalar has different additional metadata (information on the photometric system) than a redshift scalar (that might is also be part of some spatial annotation). But again I cannot see how entangling this additional metadata into a particular class that essentially only does thing entanglement will help: A client looking for this will plausibly look directly for photometric system annotation rather than look for instances of "photometric scalar" and than hope it has photometric system annotation.

Catching up a bit..

my proposal over at https://github.com/msdemlei/astropy

Nice to see this.

msdemlei commented 3 years ago

On Tue, Mar 23, 2021 at 12:24:23PM -0700, Mark Cresitello-Dittmar wrote:

essentially only does thing entanglement will help: A client looking for this will plausibly look directly for photometric system annotation rather than look for instances of "photometric scalar" and than hope it has photometric system annotation.

  • I'll note that we had "scalar", "polar coordinate", "cartesian coordinate" in the coords model, and were asked to remove them in favor of a single multi-dimensional "Point", and scalar "PhysicalCoordinate". I do think that one outcome of this effort is an interest in restoring the space-centric types (cartesian, spherical).

As an aside: I believe given our track record we should probably just do 2d and 3d polar coordinates in the first round (and rejoice if we do that properly), but that may just be my natural pessimism.

  • When you say: "Of course, a photometric scalar has different additional metadata (information on the photometric system) than a redshift scalar"
    • to me, this calls for a model element which tells the client that "if you have come across a photometric scalar, look 'here' for the additional photometric system metadata". We need to define the association in the model

But what does this extra intermediary buy vs. looking for the photometric system metadata directly?

  • A client looking for this will plausibly look directly for photometric system annotation rather than look for instances of "photometric scalar" and than hope it has photometric system annotation
    • I think a client processing a cube will note it has magnitudes, and then ask which bands are they in?

Well, there are certainly many ways a client may be prompted to look for photometry metadata, units being one, UCDs another, but user action ("plot this as a photometric time series") IMHO the most likely one. But whatever the reason, I don't see how "is there a annotation as a photometric scalar?" will make a client's life simpler than asking "is there photometric system annotation?".

Part of this is of course the outlook: In my metamodel you can't say a column "mag" is a phot:PhotometricPoint because it can be part of a large number of annotations (among them potentially also and importantly phot2:PhotometricPoint). That clients look for annotations they understand (or prefer) is normal in this system and the reason for its robustness over evolution.

my proposal over at https://github.com/msdemlei/astropy

Nice to see this.

  • the interface looks very similar to the rama interface which I'm using in my implementations... looks like your 'get_annotations()' is similar to Rama's 'find_instances()'.

I'd not be surprised -- I think it's a rather natural API to this kind of thing.

  • a quick question about the target position example.
    • for ann in target.position: # this iterates over the fields/params containing the target position

Right.

   >   pos_anns = ann.get_annotations("stc2:Coords")
* can you explain the path from looping over the ITEMs under the position ATTRIBUTE, to an stc2:Coords instance?
    * I don't see how iteration resolves to a stc2:Coord

In case of doubt, you can use iter_annotations() on a column to see how it works out. The basic scheme, however, is that whenever an item (param, field, table, resource) is referenced from an annotation ("instance"), the software will add this annotation to the list of annotations of that item. Hence, in this situation, where ra is the longitude of the space attribute (type stc2:SphericalCoordinate) of an stc2:Coords instance, ra whill have annotations for both stc2:SphericalCoordinate and stc2:Coords.

mcdittmar commented 3 years ago

pos_anns = ann.get_annotations("stc2:Coords") can you explain the path from looping over the ITEMs under the position ATTRIBUTE, to an stc2:Coords instance? I don't see how iteration resolves to a stc2:Coord

In case of doubt, you can use iter_annotations() on a column to see how it works out. The basic scheme, however, is that whenever an item (param, field, table, resource) is referenced from an annotation ("instance"), the software will add this annotation to the list of annotations of that item. Hence, in this situation, where ra is the longitude of the space attribute (type stc2:SphericalCoordinate) of an stc2:Coords instance, ra whill have annotations for both stc2:SphericalCoordinate and stc2:Coords.

  <ATTRIBUTE dmrole="position">
    <COLLECTION>
      <ITEM ref="ra"/>
      <ITEM ref="dec"/>
      <ITEM ref="ssa_location"/>
    </COLLECTION>
  </ATTRIBUTE>

OK.. so, if we're iterating through the ITEMs, it should find:

So, you would find the Target position if you put ANY leaf from the stc2:Coords content into the Target.position collection.

Q: how does this play out if the "stc2:Coords" is made entirely of LITERALs? There will be no 'ref' content to match. Q: I've mentioned this before, but ... since the annotation reflects the model structure. Using the 2 annotations of "stc2:SphericalCoordinate", the underlying model would be:

msdemlei commented 3 years ago

On Wed, Mar 24, 2021 at 07:26:57AM -0700, Mark Cresitello-Dittmar wrote:

      <ATTRIBUTE dmrole="position">
        <COLLECTION>
          <ITEM ref="ra"/>
          <ITEM ref="dec"/>
          <ITEM ref="ssa_location"/>
        </COLLECTION>
      </ATTRIBUTE>

OK.. so, if we're iterating through the ITEMs, it should find:

  • "ra" - included in "ds:AstroTarget" which is in "ds:Dataset", "stc2:SphericalCoordinate" which is in "stc2:Coords"
    • returns pos_anns[0] = the "stc2:Coords" instance
  • "dec"- included in "ds:AstroTarget" which is in "ds:Dataset", "stc2:SphericalCoodrinate" which is in "stc2:coords"
    • returns pos_anns[1] = the "stc2:Coords" instance (the same one)
  • "ssa_location" - included in "ds:AstroTarget" which is in "ds:Dataset", "stc2:SphericalCoordinate" which is in a different "stc2:coords"
    • returns pos_anns[2] = the other "stc2:Coords" instance

So, you would find the Target position if you put ANY leaf from the stc2:Coords content into the Target.position collection.

Right. The client get to choose whatever it understands, or, in the advanced cases, whatever it prefers (think: simple position vs. a simple MOC vs. a spatial distribution of MOCs)

Q: how does this play out if the "stc2:Coords" is made entirely of LITERALs? There will be no 'ref' content to match.

First, I'd really like to discourage the use of literals against properly making PARAM-s whenever that's not too inconvenient; this is also because non-DM-enabled clients will still find the information, and users can still play with it based on human understanding of the stuff.

Using PARAMs, such quanitites will also have types, units, xtypes, clear serialisation rules and all the other VOTable luxuries (you may remember me having argued against LITERAL-s in the VO-DML discussions). But then I give you just writing <ATTRIBUTE dmrole="orientation" value="ICRS"/> is too convenient to miss.

But then if you really want immediates in COLLECTION-s my current proposal lets you have INSTANCE-s in them (though that's untested and might break on some little mistake yet). So, you'd write:

<ATTRIBUTE dmrole="position">
  <COLLECTION>
    <ITEM ref="ra"/>
    <ITEM ref="dec"/>
    <ITEM ref="ssa_location"/>
    <INSTANCE dmtype="moc:WithLikelihood"">
      <ATTRIBUTE name="likelihood" dmtype="real" value="0.95"/>
      <ATTRIBUTE name="value" dmtype="???" 
        value="3/23-27 5/290,332,560"/>
    </INSTANCE>
  </COLLECTION>
</ATTRIBUTE>

(but as the "???" indicates: in all but the most trivial cases I think that's a bad idea as explained above).

Q: I've mentioned this before, but ... since the annotation reflects the model structure. Using the 2 annotations of "stc2:SphericalCoordinate", the underlying model would be:

  • SphericalCoordinate
    • frame
    • longitude
    • latitude
    • value - ssa_location (which includes longitude, latitude and some frame info) is assigned to this attribute which really should not be an attribute of SphericalCoordinate.)

No, value/ssa_location doesn't really include the frame info any more than longitude/ra and latitude/dec does. Yes, they're referencing a COOSYS, but that we want to get rid of in the long run.

But yes, such a model would be possible, and I think our models should acknowledge the existence of the DALI types and make them annotatable in some way. Whether or nto the ad-hoc thing I quickly invented here is a good way I'm happy to discuss (and I suspect it's not).

Also note that against the original annotation I've changed the annotation of ssa_location to a hypothetical stc3:Coords model to better make the intended point, which is: if better/newer ways to describe, in this case, the target position come up over time, they can be accomodated without having to touch the ds:Dataset model or breaking legacy clients.

Apologies for having come up with a bad example initially.

lmichel commented 3 years ago

Markus.

I think I agree here, but perhaps you could point at examples for the two approaches you envision here?

GAIA TS added in raw_data

Ah-hm... sorry, but "nothing prevents" is a weak reason

The strong reason is that my model needs an attribute carrying the physical measure meaning and there no modeling rule preventing to add it to the model. such attribute is valid.

...I still don't understand what you mean by self-consistent. Could you perhaps try again to explain what you mean by that (is it "we can serialise instances outside of container formats"?) and what use cases you'd like to enable by this self-consistency?

self-consistent The model must contain all attributes and relations required to describe the domain data. Instances of that model, whatever the serialization is, must have all of these attributes and relations properly set. The use-case is the interoperability in general and to be more specific, the capacity to exchange model instances e.g. by SAMP, DataLink or any other WEB endpoint. I'm also aware on that many people are looking at other media than VOTable. I'm thinking at JSON/YAML serializations which are mid term use-cases.

msdemlei commented 3 years ago

On Thu, Mar 25, 2021 at 09:52:47AM -0700, Laurent MICHEL wrote:

I think I agree here, but perhaps you could point at examples for the two approaches you envision here?

GAIA TS added in raw_data

Ah... I think at some point we have to say "well, structure your tables differently". Associating metadata to table cells (as here, where the values G, BP, and RP in the rows would need to come with photometric system annotation) is a recipe for disaster in so many ways.

We've just almost gotten rid of the terrible Frame-and-whatnot strings in STC-S geometries by DALI geometries. Let's not bring inhomogeneous-metadata columns back again.

If I'm adamant on one thing, it's that metadata needs to be associated to columns and params, but not to individual table cells. Violate this principle, and the tables you get are basically unhandlable. A very simple example: With different metadata per row, you normally cannot meaningfully compare to values in the same column any more. Which is the most basic thing you want in a table ("sorting").

And hence the Gaia folks should have written this table with three photometry columns, one each for G, BP, and RP. I'm sure they'll do this when we explain them the reasoning.

So, on this I'll solemnly declare "not being able to annotate tables that aren't actually tables is a feature rather than a bug".

serialise instances outside of container formats"?) and what use cases you'd like to enable by this self-consistency?

self-consistent The model must contain all attributes and relations required to describe the domain data. Instance of that model, whatever ther serialization is, must have all of these attributes and relations properly set. The use-case is the interoperability in general and to be more specific, the capacity to exchange model instances e.g. by SAMP or DataLink. I'm aware on

I'll note in passing that Datalink is of course VOTable, and that VOTables are regularly exchanged through SAMP.

that many people are looking at orther media than VOTable. I'm thinking at JSON/YAML serializations which are mid term use-cases.

I'm not saying that you can't re-invent VOTable in JSON or YAML or anywhere else; that actually wouldn't need to many conventions for the more capable of the container formats (among them of course where to put UCDs, units, xtypes and how to represent PARAMs and COLUMNs).

But that doesn't mean we need to encumber our models with things that VOTable has already solved (it won't stop with UCDs; as soon as the first clients consume your JSON, you'll see the discussion on date formats flaming up again, and you'll have lots of fun at ADASS sitting in JSON-for-Models BoFs).

No, let's concentrate the limited capacities we have on things that VOTable cannot do. Teaching other container formats things VOTable can do that they can't is a problem that can be solved entirely independently when we actually have it.

lmichel commented 3 years ago

And hence the Gaia folks should have written this table with three photometry columns, one each for G, BP, and RP. I'm sure they'll do this when we explain them the reasoning.

I'm not the curator of the TABLE that has been provided 2 years ago by ESAC. AFAIR the rationale for this structure was that time stamps are not the sames for each band, and thus this avoids Swiss cheese table.

lmichel commented 3 years ago

No, let's concentrate the limited capacities we have on things that VOTable cannot do.

But MANGO and CUBE mapping do resolve what VOTAble cannot do.

msdemlei commented 3 years ago

On Wed, Apr 07, 2021 at 12:04:26AM -0700, Laurent MICHEL wrote:

And hence the Gaia folks should have written this table with three photometry columns, one each for G, BP, and RP. I'm sure they'll do this when we explain them the reasoning.

I'm not the curator of the TABLE that has been provided 2 years ago by ESAC. AFAIR the rationale for this structure was that time stamps are not the sames for each band, and thus this avoids Swiss cheese table.

Yes, I trust they had good reasons for doing what they did, but the result still is inhomogeneous metadata on the magnitude, flux, and error columns, and hence this denormalisation results in a severely irregular table. The most obvious irregularity: a sort by magnitude has no physical interpretation.

If we try to bend our design so it works with broken data structures like this, we will make it work a lot worse on regular data -- and perhaps entirely break it. And I trust DPAC won't mind having to go for per-band time series (or the "swiss cheese") if they adopt our annotation; that will help their users, too, even the ones that ignore our annotation.

msdemlei commented 3 years ago

On Wed, Apr 07, 2021 at 12:11:13AM -0700, Laurent MICHEL wrote:

No, let's concentrate the limited capacities we have on things that VOTable cannot do.

But MANGO and CUBE mapping do resolve what VOTAble cannot do.

Sure, but in managing UCDs, units and possibly serialisation (as in xtypes), it also repeats things that VOTable can already do. And this duplication of efforts is something we should only do if we are very sure it is justified.

Until we are (and I still am not), it would seem wiser to me to postpone this "VOTable model" until we have the very basic things (STC, photometry) covered.

lmichel commented 3 years ago

This time we are in agreement. My mapping should be able to refer to FIELD meta-data instead of duplicating them. This has already been discussed with @mcdittmar see and here.

Mango uses extensively MCT and PhotDM.

mcdittmar commented 3 years ago

Until we are (and I still am not), it would seem wiser to me to postpone this "VOTable model" until we have the very basic things (STC, photometry) covered.

Which is proving impossible to do unless we conduct this sort of workshop demonstrating that they are usable within the context of "real" usage in Source-s, Cube-s, TimeSeries-s.

mcdittmar commented 3 years ago

Yes, I trust they had good reasons for doing what they did, but the result still is inhomogeneous metadata on the magnitude, flux, and error columns, and hence this denormalisation results in a severely irregular table. The most obvious irregularity: a sort by magnitude has no physical interpretation. If we try to bend our design so it works with broken data structures like this, we will make it work a lot worse on regular data -- and perhaps entirely break it. And I trust DPAC won't mind having to go for per-band time series (or the "swiss cheese") if they adopt our annotation; that will help their users, too, even the ones that ignore our annotation.

Hmm.. I'll maybe take a look at the GAIA multi-band example next.

My initial reaction here is that if "reorganize your data" was an option, there wouldn't be a need for the work we are doing.

It may not make sense to 'sort' on the "magnitude" columns, but it does make sense to 'screen Sources with associated G-band filter to magnitude>=X'. That is the benefit of the Models.. to turn the 'broken data structures' into meaningful entities.

lmichel commented 3 years ago

[@msdemlei] If we try to bend our design so it works with broken data structures ...

We are not trying to bend our design.

Nothing allows us to assert that such broken data structure will not be released ever.

Zarquan commented 3 years ago

The Gaia multi-band example dates back to when we started looking at how to represent time series in the IVOA. We asked data providers to send us their use cases, including examples of the kind of data that they wanted us to handle.

If I remember correctly, the structure of the multi-band time series reflects the way that the data is collected on the spacecraft, how it is processed in their data processing pipelines, and how the project scientists are used to working with it.

We asked them for examples, and they specifically requested that the IVOA time series should be flexible enough to be able to represent this use case.

I don't think that telling them they are doing it wrong is an option.

msdemlei commented 3 years ago

On Wed, Apr 07, 2021 at 07:44:50AM -0700, Laurent MICHEL wrote:

@.*** If we try to bend our design so it works with broken data structures ...

We are not trying to bend our design.

  • ModelInsanceInVot has been designed on the base of data sets we found around (TDIG work).
  • gaia_multiband is a show case for using FILTERs.

I'd conjecture you wouldn't have introduced FILTER without this particular example -- and that counts as "bending" to me.

More abstractly, VOTable right now has no per-row metadata. There's one FIELD per table column. That's a very sane design, and when we tried to break it with the STC-S strings we regretted that (and are still mopping up the resulting mess).

Nothing allows us to assert that such broken data structure will not be released ever.

Clearly people have released data like that, so such an assertion would be silly, and I'm of course not making it.

But since it breaks a very sane metamodel (tables with per-column metadata), it is something we should try hard to discourage, and it is totally possible to say "if you want interoperability, then don't do it like that."

  • This allows more compact VOtables which is something that many people wish.

Doing it properly increases the size of the VOTable by, what, 30%? After gzip by perhaps 10%? In my book that's nowhere near a good deal for complicating the metamodel by a large factor.

  • They can be consumed by specific clients or pre-processed by associated data publishers (e.g. as you did I guess) to be compliant with their infrastructure by the way.

But we're not writing our specs for "specific clients" -- they don't need the annotation, as they know what to expect anyway.

We're writing our spec so clients can do interesting things without a prior contract. In that scenario, embedding a major part of relational algebra in our metamodel (you already have Aggregation and Selection, and I'm sure you'll end up having all kinds of joins as well if you follow this path) is a very high price to pay, even more so since we already have ADQL to write relational expressions.

If we really want to enable "canned" relational operations on our tables (for which I personally don't see a credible use case yet), we could think about embedding ADQL into VOTable, and given the wide availability of SQLite, I think it could even be implemented with a reasonable effort.

But whatever we do, let's not re-invent a SQL-in-XML.

Proposing an annotation scheme that is able to map them is meaningful in this context

Hm -- I'd say a sensible restriction as to what structures can be annotated and what is just too irregular makes for a good standard.

"Do one thing and do it well" is what made the original Unix great. I think that's a good precedent to follow.