pds-data-dictionaries / PDS4-LDD-Issue-Repo

Issue repository for tracking all PDS4 Discipline Dictionary-related issues, new feature requests, and releases.
Apache License 2.0
2 stars 1 forks source link

[docs] Add information about what it means to use *_type, *_name, and *_id attributes #1

Closed jordanpadams closed 2 years ago

jordanpadams commented 5 years ago

Per past discussions with @acraugh , there is some "known" best practices for what it means to use _type, _name, and _id in your LDD. e.g._id should only be used for something that is a unique identifier (no enumerated values), _type should have a defined set of enumerated values, and _name is more of an open-ended description and should not have an enumerated set of values.

jordanpadams commented 2 years ago

now included in LDDTool warnings and errors

rgdeen commented 2 years ago

Well, that ship has sailed I think. There are lots of examples out there that don't follow those rules. Also there are gray areas, e.g. instrument_id - is that an enumerated value, or a unique identifier for the instrument? It's kinda both. It's not a unique identifier for that particular data set, but it is a unique identifier for the instrument.

Really, any enumerated list could be considered a unique identifier for the thing being enumerated. So it's really a distinction with no difference....

wkiri commented 2 years ago

I suspect the distinction being made here is syntactic, i.e., whether you specify a list of valid options for a given attribute or not. If you enumerate them, then it's enumerated and would be a "type"; if you allow any string as a unique identifier ("id"), you aren't going to enumerate them. But I'm guessing here :)

rgdeen commented 2 years ago

well I'm not a big fan of enumerating anyway (gets in the way of multimission reuse), so ID rules! ;-)

My concern though is enforcing it in validate. Are all the current DD's clean in this regard? I don't strongly object to the rule going forward, but would have a problem if it forced changes in existing DD's. The rule isn't worth the trauma of trying to fix data already out there, or dealing with warts caused by renaming attributes.

matthewtiscareno commented 2 years ago

Multimission reuse should only be done for things that are truly the same. By definition, this means that enumerated lists are often appropriate, and indeed are essential for enabling accurate multimission search.

rgdeen commented 2 years ago

Kind of off-topic for the original posting, but I completely disagree @matthewtiscareno . Multimission reuse should happen whenever it is even remotely possible, even when the definitions are not exactly the same. Often there are properties of the values that are common and those properties can be taken advantage of, even if the specific definitions vary. Differing SCLK epochs is one. Downlink priorities are another... as long as lower number = higher priority, the fact that priorities sort within the mission is highly valuable even if the range of actual priority numbers differs. There are a lot of applications where those properties are important but the specific definitions are less so - and those definitions are in the mission documentation.

It has worked out extremely well across the entire set of Mars surface missions, and is being applied to a lunar mission. For Voyager, the set of multimission templates and keywords gave me probably 75% of my label with nearly zero effort; the remaining 25% is the Voyager-specific DD we need for some concepts that are not readily shareable or are too poorly documented to make sense in a multimission framework.

Please see my set of abstracts and poster I presented on the topic of multimission reuse in PDS at the 2019 Flagstaff Planetary Data Workshop.

https://www.hou.usra.edu/meetings/planetdata2019/pdf/7050.pdf https://www.hou.usra.edu/meetings/planetdata2019/pdf/7051.pdf https://www.hou.usra.edu/meetings/planetdata2019/pdf/7052.pdf https://www.hou.usra.edu/meetings/planetdata2019/eposter/7052.pdf

matthewtiscareno commented 2 years ago

Your examples seem reasonable at first glance. Be that as it may, my point is merely that convenience for data providers is an important but not inalienable principle. It is the job of PDS to also think about maintainability, findability, and usability, and those principles sometimes conflict.

rgdeen commented 2 years ago

Agreed, they sometimes do... but my assertion is that maximizing multimission-ness actually helps with all the above in most cases. Thanks.