pds-data-dictionaries / PDS4-LDD-Issue-Repo

Issue repository for tracking all PDS4 Discipline Dictionary-related issues, new feature requests, and releases.
Apache License 2.0
2 stars 1 forks source link

[ldd-sb] Consider utilizing other Discipline LDD classes/attribute and/or extracting to separate LDDs to enable reusability #278

Closed jordanpadams closed 7 months ago

jordanpadams commented 7 months ago

Issue Type Using the LABELS section to the right of this window >>>> , please indicate if this is a BUG or ENHANCEMENT request. If unknown, feel free to leave that blank and the LDD Steward can triage appropriately.

Describe the issue identified (if applicable) Poked through the LDD briefly, and noticed some things in the LDD that can or are already described in other LDDs. Here are a few examples:

From IMG:

It also seems a few classes could be reusable outside of the Small Bodies world, e.g.:

Describe the solution you'd like Consolidate and reuse where possible

Describe alternatives you've considered Not reusing, which makes it harder for users to build generic tools to search and access data across the PDS

LDD Dictionary Version For bug fixes, note the LDD dictionary version in question.

PDS4 IM Version 1N00

Need-by Date N/A

Additional context This is just a thought to open a discussion.

acraugh commented 7 months ago

Yeah, maybe, they could have, at the time, if I could have gotten the appropriate changes and additions made in a timely manner in a dictionary that was DOCUMENTED and accommodating to ground-based/small bodies data. Sadly, none of those conditions exist and we have data to archive and to migrate, and hard deadlines.

For example; the exposure_duration definition in the IMG dictionary doesn't provide the information needed to calibrate the usual ground-based case. For that we need the amount of time each pixel was exposed, regardless of how long the detector was exposed. I'm surprised that definition even accommodates the usual imaging data you get from things like pushbroom cameras. It's certainly not what SB users think of as "exposure time" for the RALPH instruments, for example, where the detector is only one row and is read out repeatedly to build the image over time as the spacecraft rotates.

The Image_Filter and Spatial_Filter concepts aren't relevant to anything SB people do, as far as I can tell from the schema descriptions. The Optical_Filter class is woefully simple compared to the complications we ultimately need to document in the SB case - things like standard filters, narrow and broadband filters, linearly variable filters and their slopes, polarizing filters (linear and circular) and their orientations, and so on. And then of course there are the prisms and gratings, and the filter curves in the documentation set that should be linked from the labels.

As with the filter class, flat fielding information is just one small part of the calibration information we want to get into the labels. And while the flat fielding class might have everything we'd need, the dark correction doesn't, and I don't see steps for thing like PSF convolution, sky subtraction, extinction correction, star subtraction, cosmic ray subtraction, and various other things done to tease out the little fuzzy moving objects from the general hubbub of the universe. The SB namespace is going to contain an array of these calibration steps commonly used in small bodies analysis, most of which no other node seems to need, all in one place. The goal here is to tell precisely what processing has been performed and not performed, so the user can determine when and what additional processing is needed for their analysis. We will also include direct links to the calibration files for each step, to allow for the possibility of programmatic processing, either on our end or the user's.

The Quality_Map is designed as a test case that could be moved elsewhere once we've got the quirks worked out in some SBN data we're migrating, and assuming it actually is applicable to other cases. It's a promising start, but so far only one test case has been coded. Anyone else who finds the problem interesting is welcome to contribute or clone, but I wouldn't assume the current form is final. We can't be the only node that has ever encountered quality data, but apparently we're the only one that thinks it worth documenting in a programmatically actionable (we hope) way. I don't have the time or patience to try to get the rest of PDS interested in solving a problem they don't seem to think they have.

I also need to add geometry to either the SB or EBT dictionary, because there is no dictionary that now supports simple, 2-dimensional pointing information for image and non-image observations. It would take a fair amount of work to do that in the Geom namespace, which is totally geared toward mission data, 3D vectors, images, and SPICE output. The overhead would be disproportionate, and the Geom stewards didn't want to deal with it (can't blame them - I don't want that pointless mission overhead in the ground-based labels, either). I'm annoyed it's not already in the EBT - almost as much as I am annoyed by the lack of documentation for that namespace. I have dusted off my copy of the Riot Act for our next internal meetings.

But the real showstopper and absolute sticking point, as far as I am concerned, is the lack of documentation for these other namespaces. Having done a detailed analysis of more than a few PDS namespaces, I have no confidence in an undocumented namespace - no confidence that I can understand what is currently in there; no confidence that anything I add would be curated with ground-based/SB use in mind; and negative confidence that anything added to such an environment to support SB science is not going to either disappear or be reinterpreted into something that is no longer usable. Neither do I want to be directing SB users to undocumented namespaces. They expect and deserve better than that, as do all PDS users. The only namespaces where I can make sure that is not the case are the ones that fall under my tyranny.

If and when there is a well-documented namespace that contains something I can, or nearly can, use, then I'll propose a modification and recommend it to SB data preparers. Until then, it's whistling in the wind.

matthewtiscareno commented 7 months ago

@acraugh: I don't think there is anything that is fundamentally characteristic of Small Bodies observers that is not shared by (for example) Outer Planetary Systems observers.

That quibble aside, I fundamentally agree with you that many attributes need to be defined in ways that are parochial to the instrumentation being used, and that we should not expend effort in trying to make these searchable across data sets.

For example, any two sets of instrumentation will generally have differences in their filters and filter names, so nobody will ever want to do a cross-mission search on filter names. On the other hand, people definitely do want to do cross-mission search on the wavelengths that various filters correspond to, which is why we have spent time carefully curating the content of wavelength-related attributes in order to facilitate such searches on OPUS.

acraugh commented 7 months ago

@matthewtiscareno: Actually, for ground-based observations users do want to search on filter names from time to time. Thus the remark about "standard filters" (think "Johnson", "Kron-Cousins", etc.). And comet people might want to search on the molecular species singled out by narrowband filters, even if the particulars of the ranges vary.

Part of the problem is that the "fundamentals" are easy. Everybody calibrates their data. You remove the instrument effects and the sky effects. That's fundamental. It's the details that make the difference - dark current removal depends on instrument design; flat-fielding depends on the observing conditions; sky subtraction involves various methods for determining the sky contribution; etc. Even through very similar instruments, the steps are going to vary for the observing conditions and, more significantly, what particular message you are trying to extract from the photons collected.

Which is not to say that it is impossible to have a single place where all those things could be collected, organized, and documented. But it takes concerted effort, very broad input, mutual respect for the disciplines requiring support, and careful, ongoing curation. We apparently can't even get the stewards to create documentation for the dictionaries created unilaterally. What does that say to the outside world about our level of curation?

The other part of the problem is the Reusability issue. Metadata that is good enough for mission purposes now is a nice start, if you're interested in mission data and don't want to do anything too far removed from the initial analysis. To meet the Reusability goals of FAIR, however, that information needs to be programmatically actionable, which means (at least until the Singularity) that it needs to be in well-defined, well-structured fields designed to enable programmatic action in future even if we (PDS) have no plans to write that code.

The local dictionaries are supposed to be an enabling technology, not a roadblock or a black box or a "Here Be Dragonnes" signpost.

jordanpadams commented 7 months ago

Roger that. Closing as invalid. Misunderstanding on my end and the lack of documentation on the core issues here

matthewtiscareno commented 7 months ago

Actually, for ground-based observations users do want to search on filter names from time to time. Thus the remark about "standard filters" (think "Johnson", "Kron-Cousins", etc.).

@acraugh: Looking at the SB dictionary, I see that you've created a standard_filter_identification attribute that allows people to enter free-form text with such information. I don't know that that will be very effectively searchable if it's not an enumerated list, right? Nevertheless, if these are standard ground-based attributes, then perhaps the SB dictionary should actually be called the Ground-Based Observing dictionary?

matthewtiscareno commented 7 months ago

@acraugh: I think @jordanpadams has a good point that the wavelength-related and filter-related attributes in the SB dictionary overlap with similar attributes in (perhaps among others) the Rings dictionary. For example, you have an attribute center_wavelength in the SB dictionary, and we have an attribute wavelength in the Rings dictionary. Aren't these the same quantity? Shouldn't these things be searchable across different types of observations?

I repeat that, unless you have a counter-example, there is nothing about ground-based observing that is unique to Small Bodies, or Rings, or whatever target. So it makes sense that we should try to unify these attributes if we can.

It seems that perhaps your answer above is that the rest of PDS has not taken the lead on this sufficiently for your needs and/or has not supported you in taking the lead to do this. Is that a fair interpretation?

acraugh commented 7 months ago

So why is there a "wavelength" in the Rings dictionary to begin with? What on Earth would prompt me to look there? Your "support" page link is this: https://pds-data-dictionaries.github.io/support/; your github.io documentation site is this: https://pds-data-dictionaries.github.io/ldd-rings/, and the only documentation in the "docs/" directory of your repo is this: https://github.com/pds-data-dictionaries/ldd-rings/blob/main/docs/examplenamespace.pdf .

Now, you might argue that I am an insider and a PDS4 specialist, so I should know how to look in other places. But my point is, why should I have to undertake an Easter Egg hunt for this putative "identical" information for even one discipline dictionary, let alone a dozen or more? Even if I do, and decide to make use of it, am I really doing the right thing for my users by referencing undocumented namespaces which were clearly designed for other discipline contexts? How would an end-user know to repeat the hunt I had to undertake to be sure they are correctly interpreting metadata from a "rings" dictionary, or any other undocumented dictionary, come to that?

I am not funded to be a technical editor for the entire PDS system, at least not at this point. The only node I have any hope of brow-beating into completing their documentation is my own, which I am in the process of doing right now. I have no more time to give to other stewards and other nodes reviewing namespaces for consistency, completeness, and logical coherence.

So I respectfully suggest that the rest of the stewards get their houses in order before attempting to blow mine down.

-Anne.

On Mon, Apr 8, 2024 at 6:19 PM Matthew Tiscareno @.***> wrote:

@acraugh https://github.com/acraugh: I think @jordanpadams https://github.com/jordanpadams has a good point that the wavelength-related and filter-related attributes in the SB dictionary overlap with similar attributes in (perhaps among others) the Rings dictionary. For example, you have an attribute center_wavelength in the SB dictionary, and we have an attribute wavelength in the Rings dictionary. Aren't these the same quantity? Shouldn't these things be searchable across different types of observations?

I repeat that, unless you have a counter-example, there is nothing about ground-based observing that is unique to Small Bodies, or Rings, or whatever target. So it makes sense that we should try to unify these attributes if we can.

It seems that perhaps your answer above is that the rest of PDS has not taken the lead on this sufficiently for your needs and/or has not supported you in taking the lead to do this. Is that a fair interpretation?

— Reply to this email directly, view it on GitHub https://github.com/pds-data-dictionaries/PDS4-LDD-Issue-Repo/issues/278#issuecomment-2043735318, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBV2QCIB3OSDGVAXXGOXBLY4MJWJAVCNFSM6AAAAABFZU633SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTG4ZTKMZRHA . You are receiving this because you were mentioned.Message ID: @.*** com>

matthewtiscareno commented 7 months ago

@acraugh: I'm sorry if it seemed that you were being attacked in this thread. My purpose was only to ascertain whether we might agree about what would be best, diagnose our reasons for not doing it that way, and consider what our path forward might be.

Having a "wavelength" in the Small Bodies dictionary seems no more or less appropriate than having it in the Rings dictionary, doesn't it? In both cases, the dictionary is named for a science discipline, but it contains attributes that pertain to observing techniques that are also used in other science disciplines.

It's very possible that the current trend cannot be reversed, and that the best solution will be to create attributes that do not reside in product labels but in metadata bundles. These could be specifically conceived to facilitate cross-mission and cross-discipline search (which can be a hard thing to expect from data providers) and could be archived using Product_Metadata. Such a procedure would mirror what we have always done to power OPUS.

acraugh commented 7 months ago

If there's going to be a consolidation and reorganization of metadata from discipline and mission dictionaries to support better and more reusable metadata, one would normally look towards a major version upgrade for that. IM 2.0 is too close, at least nominally, to do that job properly. But if the documentation can be developed and the comparative analysis done, that might be a major feature of IM 3.0. If the entirety of PDS is committed to it, it could be completed in probably about a year. That's a big "if", however.

-Anne.

On Tue, Apr 9, 2024 at 5:35 PM Matthew Tiscareno @.***> wrote:

@acraugh https://github.com/acraugh: I'm sorry if it seemed that you were being attacked in this thread. My purpose was only to ascertain whether we might agree about what would be best, diagnose our reasons for not doing it that way, and consider what our path forward might be.

Having a "wavelength" in the Small Bodies dictionary seems no more or less appropriate than having it in the Rings dictionary, doesn't it? In both cases, the dictionary is named for a science discipline, but it contains attributes that pertain to observing techniques that are also used in other science disciplines.

It's very possible that the current trend cannot be reversed, and that the best solution will be to create attributes that do not reside in product labels but in metadata bundles. These could be specifically conceived to facilitate cross-mission and cross-discipline search (which can be a hard thing to expect from data providers) and could be archived using Product_Metadata. Such a procedure would mirror what we have always done to power OPUS.

— Reply to this email directly, view it on GitHub https://github.com/pds-data-dictionaries/PDS4-LDD-Issue-Repo/issues/278#issuecomment-2046088255, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBV2QG6ZAGIBF24TG3S53LY4RNJFAVCNFSM6AAAAABFZU633SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWGA4DQMRVGU . You are receiving this because you were mentioned.Message ID: @.*** com>

matthewtiscareno commented 7 months ago

I'm not necessarily suggesting that we consolidate and re-organize the metadata in product labels, although we could.

What I just suggested is adding a layer via Product_Metadata that would facilitate search without affecting product labels.

That would not require a major version, would it?

rgdeen commented 7 months ago

Sorry to come to the party a bit late, I was on vacation last week. I have several thoughts here. I'll post the three points I want to respond to in different messages for ease of replying.

First, we should strive to use attributes from the most appropriate LDD. That may mean you have 20 LDD's referenced in a given label, but there's nothing wrong with that. For example, wavelength does not make sense in either a SB or Rings LDD. It does make sense in IMG or SPECTRAL. So if you have to describe a wavelength, use one of those rather than something discipline-specific. It does not preclude you from using discipline-specific LDD's for other items.

rgdeen commented 7 months ago

Second, I could not disagree more with @acraugh 's statement: "I fundamentally agree with you that many attributes need to be defined in ways that are parochial to the instrumentation being used, and that we should not expend effort in trying to make these searchable across data sets.". We should, and must, actually expend great effort to do exactly that - make attributes searchable across data sets as much as possible.

It is part of the IM design to allow mission-specific validations or specializations of a generic concept. So for example img:filter_name could be constrained by a mission LDD to have names "red" and "blue" and by another mission LDD to have names "Johnson" and "Kron-Cousins". How is that useful, you might ask? After all the names don't match across missions. But the concept does. If you're building a UI to find and access data, you want to present to the user a search facet for Filter Name. It doesn't matter what the possible values are, that's easily gathered by the program to make a pick list. But by using the same attribute for this concept, it makes UI design infinitely easier. Otherwise the UI has to go search for 25 different filter_name attributes and generic tools become impossible to write. That's kind of the whole point behind the IM design - reuse attributes whenever possible.

There are commonalities that are useful to exploit. Spacecraft clock, for example. Even if the epoch is different, the idea that there's a monotonically increasing clock that can be used to order observations is useful to capture in a common attribute. Specializations of the meaning ("nuances") can be used to elucidate the specific definitions for a given mission. You may say "if the definition isn't precise what good is it". Well it is precise, if you include both the primary and nuance definition. You do not gain anything by making it a separate attribute, because you can do just about anything you need with the nuances and mission-specific constraints... but by making it separate you lose a lot in terms of usability. If you want to do calculations with the sclk, you have to do it mission-specific... but you have to anyway if it's a separate attribute, so again you haven't lost anything. Steve Hughes is working on formalizing the whole nuance idea, which we use extensively in the Mars world.

Please see these two documents (100% automatically generated) for a better idea how this works in practice. These are the label tables for M20 and InSight. Note the "XXX Specific" notes in the definition column. An example is worth 1000 words. (there's a bug where the mission-specific valid lists are not showing up, but I'm working on that - pretend they're there).

https://pds-geosciences.wustl.edu/m2020/urn-nasa-pds-mars2020_mission/document_camera/Mars2020_Camera_SIS_Labels_sort_pds.html

https://planetarydata.jpl.nasa.gov/img/data/nsyt/insight_cameras/document/pds4_attribute_definitions_sort_pds.html

Can you tell me how anything is gained by making for example download_priority a mission-specific item? Despite the slight differences in definitions? Or exposure_duration_count?

rgdeen commented 7 months ago

Finally, @acraugh you complained about the documentation, and rightly so. It's hard to find resources for that as we all know. I did want to say though that the entire idea behind IMG (at least) is that it is expandable. We put in what we needed at the time, with the full expectation that we will add more items later. Hopefully we designed it in such a way as to make those additions fit in nicely (at least we tried to). So if you need to describe "PSF convolution, sky subtraction, extinction correction, star subtraction, cosmic ray subtraction, and various other things" ... please, by all means, propose some classes/attributes!! We'd be happy to add them to IMG. I'm also happy to help you define them in a way that's consistent with the IMG LDD design philosophy. Which should be documented, I fully agree on that. But lack of documentation should not preclude use of the LDD (and I'm disturbed that you think that's a blocker to using it at all). The attributes and classes themselves are for the most part described adequately I believe (please point out counterexamples and we'll fix them). It's the overall concept documentation that is (entirely!) lacking. But polluting the IM with duplications just for the sake of "it's not documented" is a disservice to our users. Let's work together and make it better for everyone ... providers, nodes, and users.