rdawg-pidinst / schema

RDA WG PIDINST Metadata Schema
Other
20 stars 4 forks source link

Include physical object identifier such as inventory number #15

Open huberrob opened 5 years ago

huberrob commented 5 years ago

To allow to link between the physical object (instrument) and its digital representation, it would be good to have a property such as 'physicalObjectIdentifier' for Ids such as 'inventory number' etc.. The schema now has relatedIdentifier as well as alternateIdentifier. It is not clear which one should be used in case the 'real' physical object should be identified.

RKrahl commented 5 years ago

What do you mean by “link between the physical object (instrument) and its digital representation”? The instrument that the PIDINST pertains to is the physical object. I'm not sure if any “digital representation” of the instrument is within the scope of this group.

For inventory numbers, we already have alternateIdentifier. That is by the way the reason why alternateIdentifierType is free text rather then a controlled vocabulary: this property is intended to accommodate not only formal PIDs, but also less formal identifiers, such as inventory or serial numbers. The issuer of the PIDINST may thus specify in the alternateIdentifierType what kind of number the alternateIdentifier is supposed to be. E.g. X-institute-inventory-number or Y-manufacturer-serial-number.

For the difference between alternateIdentifier and relatedIdentifier: alternateIdentifier is for alternate identifiers, other then the PIDINST, pertaining to the same instrument instance. relatedIdentifier is for identifiers pertaining to other objects or entities that are related to this instrument instance, e.g. articles or extended metadata describing the instrument, other instruments related with this instrument, …

huberrob commented 5 years ago

There is indeed a difference between identifiers used for digital objects and those connected to the physical object itself. If we keep the puristic way to subsume all these ids within alternateIdentifiers then we definitely should also provide a controlled list of identifier types. Otherwise there soon will be a chaos of self defined types..

RKrahl commented 5 years ago

But our identifiers do not pertain to digital objects, they pertain to instruments, e.g. physical objects.

huberrob commented 5 years ago

Well, but our current definition states for Identifier: Unique string that identifies the instrument instance and for LandingPage: 'A landing page that the identifier resolves to' so in practise Identifier identifies the LandigPage which is a possible digital representation of the instrument.

markusstocker commented 5 years ago

I know @huberrob has made this point before but I never quite understood this. So, does an IGSN identify a sample (i.e., the physical object) or the landing page (i.e., a possible digital representation of the sample)? IMO, the case of instruments can be treated same.

I am not entirely convinced that a DOI cannot identify a physical object and resolve onto a possible digital representation of the physical object. After all, the DOI name is a quite distinct thing compared to the landing page URL. Why can we not threat the URL as the identifier of the possible digital representation of X and the PID as the identifier of X (digital or physical)?

I guess one counter argument is that a PID for physical object X could resolve onto one of many digital or physical representations (say, a digital document or a printed document representing metadata about the physical object X). Can we say that the PID identifying physical object X resolves to and only to a preferred landing page and thus digital representation? In other words, there is a one-to-one mapping but the PID name and the landing page URL (can) identify two different things?

Any more thoughts?

huberrob commented 5 years ago

IGSN now are attached to samples by early IGSN adopters, e.g. using a barcode label. This definitely is a strong, physical link to this identifier and a good practice. But it took years until this was achieved. In reality, from the issuing institutions perspective, a PIDINST will rather represent another alternateIdentifier instead of the Identifier and will be used in most institutions in addition to their internal numbers. Therefore, we must make sure that we enable a solid way to link the physical object and its main identifier with its digital representation and its PIDINST.

As I mentioned before, one way to ensure this, is to express this as a dedicated property. Alternatively we can set up a controlled list of types for alternativeIdentifierType We have to decide what we want.

RKrahl commented 5 years ago

Sorry, @huberrob, I really tried hard, but I don't get your point. Again: the instrument PID that we are discussing here does not identify any kind of digital object or a landing page, it does identify the instrument, e.g. the physical object. The landing page that the identifier resolves to is just a mean to provide more information about the instrument. It is part of the metadata in the same way as any property in this schema belongs to the metadata pertaining to the instrument.

This is by the way the same as for most kinds of identifiers:

If there is a need to attach barcode labels with the PIDINST to the instrument or not is out of the scope of this WG, I'd say. At least for our instruments, I can say that they are big enough and difficult to move, so that it is really hard to overlook or to loose them. I don't think we will need such barcodes.

The same instrument that is identified by a PIDINST will certainly have more then only this identifier: inventory number in the owning institute, an entry in some instrument database with it own database id, serial number of the manufacturer, and so on. That is what AlternateIdentifier is for, because it makes sense to create links between the identifiers in order to be able to check which item in the institutional inventory database has been attributed which PIDINST. Which of these is the identifier and which is yet another identifier will depend on the use case and perspective and may change from minute to minute. If I search our internal instrument database I might need the database identifier, so at this moment this will be the identifier for me. Ten minutes later, I might prepare a data publication and link that to the instrument that produced the data. At that instant, the PIDINST will be the identifier that is relevant.

Most of these other identifiers are not formalized PIDs. Therefore it is not possible to enumerate all identifier types that will be used for AlternateIdentifier. And that is why it is and should be free text rather then a controlled list of values. If I put <alternateIdentifier alternateIdentifierType="URL">http://www.example.org/someurl/</alternateIdentifier> in the metadata of my instrument, you will most probably know how to follow this identifier. If I put <alternateIdentifier alternateIdentifierType="IGAMA number">1848</alternateIdentifier> there, you will most likely not know what it is, so you will at least be able to guess that this piece of information is not relevant for you. But other people do know what it is and for those people it is useful information. Both entries may be valuable depending on the use case.

RKrahl commented 5 years ago

We discussed this in today's meeting:

@huberrob: if you believe that there is still an issue with the schema after this clarification, feel free to reopen.

huberrob commented 5 years ago

Dear all,

I would like to reopen this issue and maybe we should merge it with #5 on 'serial numbers' proposed by @louatbodc as both issues are dealing with the imho essential desired capability to link a PIDINST and associated metadata record with the physical object which is not sufficiently adressed now by alternateIdentifier.

@RKrahl states a PIDINST would already identify the instrument but if this is still illusionary. Instead it is most likely that PIDINST will be used in addition to existing instrument identifiers only in order to have an effective way to link digital records with instrument representations.

Even our schema reflects this, otherwise we would not have defined LandingPage as mandatory property.

We should not ignore, widely used existing identifiers for instruments, which in most cases actually are physically attached to the instrument: serial numbers as well as inventory numbers (or accession numbers). Btw. these identifiers are also regarded to represent essential information by standards like TEDS / IEEE 1451.4 (not sure if we have discussed this yet, I will open an issue..)

Imagine a larger institution which has hundreds of instruments in use such as the AWI which actually owns e.g. dozens of CTD sensors of the same type e.g. type SBE-37. In our current architecture all these would receive an own PIDINST and most probably would all have the same name like 'SBE-37'. Ideally these instruments would have an own landing page. But what if this is not done like we imagine this now? And instead, the landing page is just a HTML page which presents a list of instruments like:

SBE-37 PIDINST-1 SBE-37 PIDINST-2 SBE-37 PIDINST-3 etc.

How should one be able to identify a distinct instrument and e.g. find it in the shelves of this institutions based on this information? But this would be easy if we would provide an unambiguous way to include a serial number or inventory number in the PIDINST metadata.

Using alternateIdentifier is not the appropriate solution for inventory numbers or serial numbers. We all know how difficult it is to maintain consistency in using metadata schemas. People will use alternateIdentifier as well as relatedIdentifier to fill in serial numbers and they will use 'serial number', 'Seriennummer', 'ser. number', 'S/N', 'SRID', 'product key', 'serial key' etc. etc.. for this purpose. Do we really want this?

This is why I think we either need dedicated elements for serial numbers and inventory numbers or a dedicated generic element for physical identifiers and a controlled list of types at least for these.

best regards, Robert

RKrahl commented 5 years ago

I'll reopen as requested by @huberrob. I will comment, maybe next week.

markusstocker commented 5 years ago

My thoughts here:

@RKrahl states a PIDINST would already identify the instrument but if this is still illusionary. Instead it is most likely that PIDINST will be used in addition to existing instrument identifiers only in order to have an effective way to link digital records with instrument representations.

I think we do not pretend that PIDINST will be the identifier for instruments. I completely agree that it will be one among others. As you correctly note, it will be the one that in contrast to others enables resolution of the identifier to further data about instrument on the web. However, and this is important, we do claim that PIDINST identifies the instrument, the physical object, not the metadata attached to the identifier, not the landing page or data returned on that landing page. Whether the PIDINST will be attached to the instrument, with a barcode, engraved in the instrument case, or not at all is IMO irrelevant or left at the discretion of the instrument owner. The PIDINST understanding is that the identifier identifies the instrument, the physical object.

Now, I understand this can be controversial. Indeed, it is related to the controversy whether a DOI is a digital identifier for (digital or physical) objects or an identifier for digital objects. This is all a bit philosophical IMO and not even the PID community has a clear answer. I am also unclear about the practical implications. Hence, unless we clarify why a string "10.123/abc" cannot identify a physical object (as IGSNs do) I suggest we continue with the position the PIDINST indeed identifies the instrument, the physical object.

We should not ignore, widely used existing identifiers for instruments, which in most cases actually are physically attached to the instrument: serial numbers as well as inventory numbers (or accession numbers).

The schema doesn't ignore them and provides a mechanism to explicitly include them (alternateIdentifier).

Imagine a larger institution which has hundreds of instruments in use such as the AWI which actually owns e.g. dozens of CTD sensors of the same type e.g. type SBE-37. In our current architecture all these would receive an own PIDINST and most probably would all have the same name like 'SBE-37'. Ideally these instruments would have an own landing page.

Would all have the same model name but, yes, correct.

But what if this is not done like we imagine this now? And instead, the landing page is just a HTML page which presents a list of instruments.

I don't think we can enforce this or I don't see how. A more or less weak parallel may be that DOIs are used to identify collections of documents, and resolve on collection landing pages. Each item in the collection may or may not have a DOI. I agree that this is less of a problem for DOIs, since DOI goes for any digital object (atomic or collection). If we argue that PIDINST identifies an instrument and someone registers a PIDINST to identify a digital collection of references to physical objects then this would constitute a misuse of the PIDINST identifier. For such cases, my suggestion would be to identify the collection with a DOI and the collection elements with PIDINST (note that I am not excluding the possibility that PIDINST = DOI).

How should one be able to identify a distinct instrument and e.g. find it in the shelves of this institutions based on this information? But this would be easy if we would provide an unambiguous way to include a serial number or inventory number in the PIDINST metadata.

Using alternateIdentifier.

Using alternateIdentifier is not the appropriate solution for inventory numbers or serial numbers. We all know how difficult it is to maintain consistency in using metadata schemas. People will use alternateIdentifier as well as relatedIdentifier to fill in serial numbers and they will use 'serial number', 'Seriennummer', 'ser. number', 'S/N', 'SRID', 'product key', 'serial key' etc. etc.. for this purpose. Do we really want this?

I think here we get to the meat. To your question I say, no we don't want this. But do we need a dedicated attribute serialNumber or can we address this by having a defined set of alternateIdentifierTypes? Would your concern be addressed with the following:

<AlternateIdentifier alternateIdentifierType="SerialNumber">123123123</AlternateIdentifier>

whereby SerialNumber is a defined string out of a closed collection?

This of course opens the issue on how we come up with that close collection but I wonder if this can be addressed by schema versioning and adapt the collection following community requests.

huberrob commented 5 years ago

Dear Markus, maybe we should wait for more community input here. I still think serial numbers need some special consideration also with respect to future links to the engineering community. I am not sure if a controlled list of type terms would fully solve the problem... have you seen the issue I created yesterday regarding TEDS / IEEE 1451.4 (https://github.com/rdawg-pidinst/schema/issues/20) ? From these documents it seems as serial numbers are regarded as part of minimum information by other communities.

Further, it is not clear to me why we are so picky with identifiers and why we are so desperately trying to avoid dedicated properties for IDs. Many other properties are not treated like this in our schema. For example Manufacturer and Owner are explicitely modelled in the schema as dedicated properties instead of defining a generic property 'Organisation' with roles/type attributes like 'InstrumentOwner' for involved actors.

I include Christoph in this email, he already had some communication with IEEE regarding sensor description and maybe he can give us some additional insights from the IEEE perspective?

best regards, Robert

On Thu, Mar 7, 2019 at 6:27 PM Markus Stocker notifications@github.com wrote:

My thoughts here:

@RKrahl https://github.com/RKrahl states a PIDINST would already identify the instrument but if this is still illusionary. Instead it is most likely that PIDINST will be used in addition to existing instrument identifiers only in order to have an effective way to link digital records with instrument representations.

I think we do not pretend that PIDINST will be the identifier for instruments. I completely agree that it will be one among others. As you correctly note, it will be the one that in contrast to others enables resolution of the identifier to further data about instrument on the web. However, and this is important, we do claim that PIDINST identifies the instrument, the physical object, not the metadata attached to the identifier, not the landing page or data returned on that landing page. Whether the PIDINST will be attached to the instrument, with a barcode, engraved in the instrument case, or not at all is IMO irrelevant or left at the discretion of the instrument owner. The PIDINST understanding is that the identifier identifies the instrument, the physical object.

Now, I understand this can be controversial. Indeed, it is related to the controversy whether a DOI is a digital identifier for (digital or physical) objects or an identifier for digital objects. This is all a bit philosophical IMO and not even the PID community has a clear answer. I am also unclear about the practical implications. Hence, unless we clarify why a string "10.123/abc" cannot identify a physical object (as IGSNs do) I suggest we continue with the position the PIDINST indeed identifies the instrument, the physical object.

We should not ignore, widely used existing identifiers for instruments, which in most cases actually are physically attached to the instrument: serial numbers as well as inventory numbers (or accession numbers).

The schema doesn't ignore them and provides a mechanism to explicitly include them (alternateIdentifier).

Imagine a larger institution which has hundreds of instruments in use such as the AWI which actually owns e.g. dozens of CTD sensors of the same type e.g. type SBE-37. In our current architecture all these would receive an own PIDINST and most probably would all have the same name like 'SBE-37'. Ideally these instruments would have an own landing page.

Would all have the same model name but, yes, correct.

But what if this is not done like we imagine this now? And instead, the landing page is just a HTML page which presents a list of instruments.

I don't think we can enforce this or I don't see how. A more or less weak parallel may be that DOIs are used to identify collections of documents, and resolve on collection landing pages. Each item in the collection may or may not have a DOI. I agree that this is less of a problem for DOIs, since DOI goes for any digital object (atomic or collection). If we argue that PIDINST identifies an instrument and someone registers a PIDINST to identify a digital collection of references to physical objects then this would constitute a misuse of the PIDINST identifier. For such cases, my suggestion would be to identify the collection with a DOI and the collection elements with PIDINST (note that I am not excluding the possibility that PIDINST = DOI).

How should one be able to identify a distinct instrument and e.g. find it in the shelves of this institutions based on this information? But this would be easy if we would provide an unambiguous way to include a serial number or inventory number in the PIDINST metadata.

Using alternateIdentifier.

Using alternateIdentifier is not the appropriate solution for inventory numbers or serial numbers. We all know how difficult it is to maintain consistency in using metadata schemas. People will use alternateIdentifier as well as relatedIdentifier to fill in serial numbers and they will use 'serial number', 'Seriennummer', 'ser. number', 'S/N', 'SRID', 'product key', 'serial key' etc. etc.. for this purpose. Do we really want this?

I think here we get to the meat. To your question I say, no we don't want this. But do we need a dedicated attribute serialNumber or can we address this by having a defined set of alternateIdentifierTypes? Would your concern be addressed with the following:

<AlternateIdentifier alternateIdentifierType="SerialNumber">123123123

whereby SerialNumber is a defined string out of a closed collection?

This of course opens the issue on how we come up with that close collection but I wonder if this can be addressed by schema versioning and adapt the collection following community requests.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rdawg-pidinst/schema/issues/15#issuecomment-470618390, or mute the thread https://github.com/notifications/unsubscribe-auth/AAVuxx6SyCWfNBRssoqWH0mkS6a5TLe9ks5vUUvtgaJpZM4bKAQO .

-- Dr. Robert Huber,

PANGAEA - www.pangaea.de


MARUM - Center for Marine Environmental Sciences University Bremen Leobener Strasse POB 330 440 28359 Bremen Phone ++49 421 218-65593, Fax ++49 421 218-65505 e-mail rhuber@uni-bremen.de

markusstocker commented 5 years ago

Agreed, let's see what additional input rolls in.

I suggest we take some of these key open issues onto the agenda for P13 (@RKrahl).

have you seen the issue I created yesterday regarding TEDS / IEEE 1451.4 (#20) ?

I did, thanks, but I am not yet clear what to do with it. ;> For instance, I am not sure how widely used are the properties Model Number, Version Letter, Version Number. These didn't show up in our use cases (except Version Number in the use case by FZJ).

Further, it is not clear to me why we are so picky with identifiers and why we are so desperately trying to avoid dedicated properties for IDs.

The rationale is that with dedicated properties as well as the alternateIdentifier approach for some information there will be two ways to encode the same information, actually three. Say we have serialNumber as an additional property then one can: Use it for the serial number; use alternateIdentifier for the serial number; or use both. Intuitively I would argue this is not desirable.

Manufacturer and Owner are explicitely modelled in the schema as dedicated properties

Correct and I would argue this is because these are complex properties with name, id, contact.

I include Christoph in this email

I think it may not have been delivered to him, since your reply went to GitHub. Not sure.

huberrob commented 5 years ago

Dear colleagues,

this actually I very thoughtful and timely discussion that you have here!

I agree with most statements that had been made but one aspect that I would like to add to the discussion is the transitions that sensors go through during their lifecycle. A sensor bought at a certain time has been built and calibrated according quality standards that may change in a later production phase. This is not just meant for the hardware but also for the firmware that is used. Also, I am not just talking about service and maintenance which would imply the replacement of faulty parts but also for major design changes. My conclusion is that the serial number is helping to track down in which condition a sensor has been delivered to the customer and what upgrades had been done but there must be other properties added to account for hard- and software changes that have an impact on the overall performance.

Coming back to what Robert said about the IEEE 1451 standards the TEDS provide hints on what type of information is necessary to uniquely describe the sensor. Having said that, I suggest that the PIDINST should be seen as something that contains static information about the sensor when it has been delivered as well as dynamic information where chamges of ownership, firmware upgrades etc. can be added.

With this ability of combining static and dynamic aspects one has a better handle on versioning of the sensor system and it actually would better reflect the process how observations are carried out.

Finally I would like to mention that IEEE has no coherent approach on this issue. As I am a member of the so called IEEE Sensor Council I would be happy to start an initiative supporting the ideas that you are ventilating. It is actually a very hot topic thinking of the Internet-of-Things.

Best wishes,

Christoph

Am 08.03.19 um 18:36 schrieb Robert Huber:

Dear Markus, maybe we should wait for more community input here. I still think serial numbers need some special consideration also with respect to future links to the engineering community. I am not sure if a controlled list of type terms would fully solve the problem... have you seen the issue I created yesterday regarding TEDS / IEEE 1451.4 (https://github.com/rdawg-pidinst/schema/issues/20) ? From these documents it seems as serial numbers are regarded as part of minimum information by other communities.

Further, it is not clear to me why we are so picky with identifiers and why we are so desperately trying to avoid dedicated properties for IDs. Many other properties are not treated like this in our schema. For example Manufacturer and Owner are explicitely modelled in the schema as dedicated properties instead of defining a generic property 'Organisation' with roles/type attributes like 'InstrumentOwner' for involved actors.

I include Christoph in this email, he already had some communication with IEEE regarding sensor description and maybe he can give us some additional insights from the IEEE perspective?

best regards, Robert

On Thu, Mar 7, 2019 at 6:27 PM Markus Stocker <notifications@github.com mailto:notifications@github.com> wrote:

My thoughts here:

    @RKrahl <https://github.com/RKrahl> states a PIDINST would
    already identify the instrument but if this is still illusionary.
    Instead it is most likely that PIDINST will be used in
    addition to existing instrument identifiers only in order to
    have an effective way to link digital records with instrument
    representations.

I think we do not pretend that PIDINST will be /the/ identifier
for instruments. I completely agree that it will be one among
others. As you correctly note, it will be the one that in contrast
to others enables resolution of the identifier to further data
about instrument on the web. However, and this is important, we do
claim that PIDINST identifies the instrument, the physical object,
not the metadata attached to the identifier, not the landing page
or data returned on that landing page. Whether the PIDINST will be
attached to the instrument, with a barcode, engraved in the
instrument case, or not at all is IMO irrelevant or left at the
discretion of the instrument owner. The PIDINST understanding is
that the identifier identifies the instrument, the physical object.

Now, I understand this can be controversial. Indeed, it is related
to the controversy whether a DOI is a digital identifier for
(digital or physical) objects /or/ an identifier for digital
objects. This is all a bit philosophical IMO and not even the PID
community has a clear answer. I am also unclear about the
practical implications. Hence, unless we clarify why a string
"10.123/abc" cannot identify a physical object (as IGSNs do) I
suggest we continue with the position the PIDINST indeed
identifies the instrument, the physical object.

    We should not ignore, widely used existing identifiers for
    instruments, which in most cases actually are physically
    attached to the instrument: serial numbers as well as
    inventory numbers (or accession numbers).

The schema doesn't ignore them and provides a mechanism to
explicitly include them (alternateIdentifier).

    Imagine a larger institution which has hundreds of instruments
    in use such as the AWI which actually owns e.g. dozens of CTD
    sensors of the same type e.g. type SBE-37. In our current
    architecture all these would receive an own PIDINST and most
    probably would all have the same name like 'SBE-37'. Ideally
    these instruments would have an own landing page.

Would all have the same /model/ name but, yes, correct.

    But what if this is not done like we imagine this now? And
    instead, the landing page is just a HTML page which presents a
    list of instruments.

I don't think we can enforce this or I don't see how. A more or
less weak parallel may be that DOIs are used to identify
collections of documents, and resolve on collection landing pages.
Each item in the collection may or may not have a DOI. I agree
that this is less of a problem for DOIs, since DOI goes for any
digital object (atomic or collection). If we argue that PIDINST
identifies an instrument and someone registers a PIDINST to
identify a digital collection of references to physical objects
then this would constitute a misuse of the PIDINST identifier. For
such cases, my suggestion would be to identify the collection with
a DOI and the collection elements with PIDINST (note that I am not
excluding the possibility that PIDINST = DOI).

    How should one be able to identify a distinct instrument and
    e.g. find it in the shelves of this institutions based on this
    information? But this would be easy if we would provide an
    unambiguous way to include a serial number or inventory number
    in the PIDINST metadata.

Using alternateIdentifier.

    Using alternateIdentifier is not the appropriate solution for
    inventory numbers or serial numbers. We all know how difficult
    it is to maintain consistency in using metadata schemas.
    People will use alternateIdentifier as well as
    relatedIdentifier to fill in serial numbers and they will use
    'serial number', 'Seriennummer', 'ser. number', 'S/N', 'SRID',
    'product key', 'serial key' etc. etc.. for this purpose. Do we
    really want this?

I think here we get to the meat. To your question I say, no we
don't want this. But do we need a dedicated attribute serialNumber
or can we address this by having a defined set of
alternateIdentifierTypes? Would your concern be addressed with the
following:

|<AlternateIdentifier
alternateIdentifierType="SerialNumber">123123123</AlternateIdentifier>|

whereby |SerialNumber| is a defined string out of a closed collection?

This of course opens the issue on how we come up with that close
collection but I wonder if this can be addressed by schema
versioning and adapt the collection following community requests.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/rdawg-pidinst/schema/issues/15#issuecomment-470618390>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAVuxx6SyCWfNBRssoqWH0mkS6a5TLe9ks5vUUvtgaJpZM4bKAQO>.

-- Dr. Robert Huber,

PANGAEA - www.pangaea.de http://www.pangaea.de


MARUM - Center for Marine Environmental Sciences University Bremen Leobener Strasse POB 330 440 28359 Bremen Phone ++49 421 218-65593, Fax ++49 421 218-65505 e-mail rhuber@uni-bremen.de mailto:rhuber@uni-bremen.de

-- Dr. Christoph Waldmann University of Bremen/MARUM Leobener Strasse P.O.Box 330440 28334 Bremen/Germany Tel. +49-421 218 65606 FAX +49-421 218 98 65606 waldmann@marum.de

RKrahl commented 3 years ago

I believe we have a consensus by now that the inventory number should be included as AlternateIdentifier. I suggest to close this one. Note that #24 is still open.

huberrob commented 3 years ago

I do not agree to close this issue. I continue to see a specificity in the relationships between a physical object (via serial number, inventory number), and its digital representation. In my opinion this still is not appropriately expressed in the current model. As possible solutions I could imagine to change the scope of PIDINST to also include 'virtual instruments' or if we would provide a mandatory alternateIdentifierType vocabulary and its cardinality would be changed 1:n (forcing to include at least one identifier for a physical object).

huberrob commented 3 years ago

See also how the DISSCO community deals with 'digital specimen' see here https://github.com/DiSSCo/openDS/ and the recently published paper: https://doi.org/10.3897/rio.7.e67379

hardistyar commented 3 years ago

It is true that you can use a PID (such as a DOI) to identify both/either physical and/or digital objects. Such a PID resolves to the location of the object.

In the case of a digital object that location can be the location (repository on the Internet) where the digital object is stored - typically in the Web world a URL that can be used with HTTP requests to obtain the object, or a landing page about it. Think about journal articles where most DOIs resolve to a landing page that 'tests' whether you have a subscription to the journal before giving you the text (of the article object that has been identified).

In the case of identifying a physical object, such as an instrument or sample the resolution process must provide sufficient for you to accurately locate the object. There are at least two ways you can do this. So in the case of multiple type SBE-37 instruments as an earlier comment mentioned, i) the resolution of the PIDINST must include the physical location of the specific instance of the instrument identified e.g., the instrument serial number, the number of the shelf/cupboard it is stored on/in and even the position on the shelf if necessary to disambiguate one instrument or sample from another. It will also be helpful if the instrument/sample itself has label with its PIDINST identifier permanently attached; or ii) a URI consisting of (for example) the institution domain and the instrument serial number. Mapping of that to present cupboard/shelf location is then a local responsibility.

In the object concept, an object can either be a single object instance or it can be collection of several object instances, some physical, some digital. So, you could give a single PIDINST to all your instances of (a pool of) type SBE-37's if it doesn't matter which one gets used, or to a collection object that contains both the physical object and a digital object corresponding to that.

It all depends on how you choose to use the PID system you've chosen. In DiSSCo we will ensure at least the one-way association between a 'Digital Specimen' and its physical specimen counterpart in a museum cupboard. We will do this by including an 'institution code' and a 'physicalspecimenId' in the PID Record maintained by the Handle System for the Digital Specimen (DS) object in question. The data in the DS will also include either a URI as I described above (combination of institution domain and physicalspecimenId) or an IGSN - whichever is used by the institution.

I hope this helpful.

huberrob commented 3 years ago

Thank you very much @hardistyar!

So I assume physicalspecimenId and institution code are mandatory?

Robert

hardistyar commented 3 years ago

They are mandatory minimum pieces of information (along with a few others) that will be needed to publish DS information.

smrgeoinfo commented 3 years ago

seems like some rehashing of the old httprange-14 discussion.
The binding between an identifier string and a physical thing has use some identifying property carried by the physical thing; a simple example is a unique serial number stamped on the thing or attached as a permanent label. The physical thing can't be sent over the wire, so in HTTP protocol, there is a 303 redirect to get a digital representation of the physical thing (e.g. a 'digital specimen'). This representation must include necessary information (e.g. a serial number) to bind the digital thing to the physical thing. This digital representation is itself a different resource, and should have its own unique identifier; this allows making statements about the representation distinct from statements about the physical thing. There can be many digital representations of a physical thing; HTTP includes content-negotiation functionality to get a particular representation.

markusstocker commented 3 years ago

Nice to see you here @hardistyar - quick question since I have not understood this from your comment: Are you suggesting that sufficient information to accurately locate the physical object (here the instrument) should be PID metadata or can this information be on the landing page the PID resolves to?

RKrahl commented 3 years ago

@huberrob, I still believe that your distinction between the "physical object" and some "digital representation" of it (whatever that is supposed to mean) is artificial, it make things needlessly complicated and it does not seem provide any practical benefit. We discussed this several times in the working group meetings and agreed that the instrument PID identifies the instrument, the physical object, not the metadata attached to the identifier, not the landing page or data returned on that landing page. We don't need any additional "physicalspecimenId" because the instrument PID is exactly that.

RKrahl commented 3 years ago

Regarding the information to locate the instrument, we discussed that in #17 and finally agreed not to add any additional information such as geo coordinates. For most instruments, the street address of the Owner will be sufficient to locate it. For other instruments that are deployed in the field or used during an expedition, it would be rather challenging to describe the location by simple attributes and the location might be too volatile to be included in the PID record. For the latter case, we agreed to consider some sort of a WasUsedIn relationType so that one could point to the deployment as a related identifier.

smrgeoinfo commented 3 years ago

@RKrahl see https://github.com/rdawg-pidinst/schema/issues/15#issuecomment-875976946 for rationale for having an identifier for the physical object distinct from identifiers for a 'digital representation' of that object.

RKrahl commented 3 years ago

We already do have a place in the schema to include serial numbers and inventory numbers: AlternateIdentifier.

RKrahl commented 3 years ago

In the preparation of submitting the schema as a RDA recommendation, we plan to get a decision on all pending open questions during the next monthly meeting on 4th August.

Since I can't even recognize a clear proposal for a concrete change to the schema in this issue, I suggest to close it.

huberrob commented 3 years ago

Apparently no clear consensus exists how to deal with the 'physical thing' in PIDINST. The proposal was to include something like a 'physicalspecimenId' (or physicalinstrumentId) which clearly (and replicable) indicates the instrument actually exists or existed in the real world. You can ignore this but this will lead to the introduction of a very large number of PIDINST and associated digital objects for which it will not be possible to prove whether they ever really existed.

RKrahl commented 3 years ago

Why would anybody want to create a PID for an instrument that does not exist? I would assume that any PIDINST is associated with a really existing physical instrument or with one that did exist in the past. And even if some people would have the weird idea of attributing PIDINST to non existing instruments, how would a "physicalspecimenId" property in the schema prevent those people from doing that. So in which way would the addition of such a property make any difference here?

Furthermore, I have to disagree with the statement that no clear consensus exists how to deal with the 'physical thing' in PIDINST. At the risk of repeating myself: we discussed this several times in the meetings and we always agreed in the group that the instrument PID identifies the instrument, the physical object, not the metadata attached to the identifier, not the landing page or data returned on that landing page. There is no such thing as a digital object associated with a PIDINST.

Also, I still don't know what you mean by a 'physicalspecimenId' or how would such a property be defined.

huberrob commented 3 years ago

@RKrahl

Furthermore, I have to disagree with the statement that no clear consensus exists how to deal with the 'physical thing' in PIDINST. At the risk of repeating myself: we discussed this several times in the meetings

Ok, I always assumed all group members are allowed to give input. But if you think you can decide this on your own go ahead..