Investigate the standard practice for reporting names

baskaufs commented 8 years ago

The previous Documentation spec used terms like "First name" and "Last name". This doesn't work in cultures where family names are listed first, or where there are other practices about names. I know there is some standard or guidance on this subject. Where is it? See section 3.2.3.1 of the documentation spec

baskaufs commented 8 years ago

OK, I found the document that I was thinking about related to this subject. It is the W3C document "Personal names around the world": https://www.w3.org/International/questions/qa-personal-names After reading this document, it seems silly to me for us to be specifying any kind of stuff about "first names", "middle initials", etc. Rather we should just have the person list their full name as they typically use it professionally.

jar398 commented 8 years ago

True but I think one function listed in that document could be important:

"In some cases you want to identify parts of a name so that you can sort a list of names alphabetically, contact them, etc. Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose."

Alphabetization seems a pretty important thing. Now I would have thought you could just designate a point somewhere in the string (beginning or middle, either way) to specify the part that you would sort on. But maybe that wouldn't work, and maybe you'd need (as the doc suggests) a separate 'name sort key' property to enable alphabetization.

I have no idea what the TDWG requirements are, or what the community's complexity tolerance is, and I don't want to disagree with your conclusion, but I wanted to make sure you understand that if a all you have is a string with no structure, you lose the ability to alphabetize (and what that means in practice is a bunch of ad hoc, buggy, and mutually incompatible hacks for finding the sort key).

On Sun, May 1, 2016 at 10:33 PM, Steve Baskauf notifications@github.com wrote:

OK, I found the document that I was thinking about related to this subject. It is the W3C document "Personal names around the world": https://www.w3.org/International/questions/qa-personal-names After reading this document, it seems silly to me for us to be specifying any kind of stuff about "first names", "middle initials", etc. Rather we should just have the person list their full name as they typically use it professionally.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/tdwg/vocab/issues/29#issuecomment-216094616

baskaufs commented 8 years ago

I totally agree with your point about the importance of alphabetization. The section referenced (3.2.3.1) is about the human-readable version of the document, so I'm not sure how often alphabetization would be done on the basis of the information found there. As the text currently stands, the variety in name forms that is allowed, combined with the fact that the contributors will be a list (comma separated in current documents) make parsing a difficult task.

On the other hand there is the capability to allow for easy access to the kind of information that would facilitate alphabetization in the machine-readable metadata. The current suggested properties in the table in section 4.2 includes dc:contributor with a literal value that would be the same string as in the human-readable section, repeated for each contributor. Not specified in the table, but shown in the example, is use of the term dcterms:contributor with an IRI value that is the ORCID ID of the contributor. These are dereferenceable as RDF, and the metadata provided by ORCID includes (among other things):

<http://orcid.org/0000-0001-6215-3617> a foaf:Person ;
                                                                 foaf:familyName    "Robertson" ;
                                                                 foaf:givenName     "Tim" .

So although not specified (at least in the current draft) as required as part of the machine-readable metadata, using a dcterms:contributor property to an object that is either an ORCID ID, or a blank node that includes the family name and given name would solve this problem from the machine-readable side.

Of course, it remains to be seen whether anybody will ever do anything useful with the machine-readable metadata.

baskaufs commented 8 years ago

Edited section 3.2.3.1 to remove references to "first name" and "last name". Added blank nodes with parsed name parts to the example in section 4.2.1.

tdwg / vocab

Investigate the standard practice for reporting names #29