historical-data / schema

Microdata schema for historical data.
historical-data.org
30 stars 4 forks source link

Canonical URLs? #16

Closed RobertGardner closed 12 years ago

RobertGardner commented 12 years ago

I have a question. The spec calls for a url parameter that identifies the item. It would be really nice to have a way of determining that two references on two different sites refer to the same person. Do we need to incorporate some type of unique ID or canonical URL into this spec?

stoicflame commented 12 years ago

The spec calls for a url parameter that identifies the item.

I must have missed that. Where was that?

Do we need to incorporate some type of unique ID or canonical URL into this spec?

Can't we just use a URI?

RobertGardner commented 12 years ago

url is a required parameter in Thing.

If we use a URI, which one do we use? How do we know that two people on different sites are the same person?

stoicflame commented 12 years ago

If we use a URI, which one do we use? How do we know that two people on different sites are the same person?

Not sure what you mean. Why couldn't FamilySearch provide data about a HistoricalPerson that has a URL with the value "http://geni.com/person/12345"?

ninjudd commented 12 years ago

I was just thinking about this, but it's tricky. Whatever solution we come up with will need to be decentralized.

Some kind of content-addressable system (like git) using a SHA-1 hash of identifying information (name + lifespan + birthplace) is interesting, but then the global ID would change when that profile information changed.

Another option is to just use a unique slug for each profile (like Wikipedia articles). We have a system we could enable for letting curators or users assign these to Geni profiles. In a system like this, each site could use whatever mechanism they want for creating slugs with the understanding that matching slugs among sites indicate that the profiles are the same. I could see problems arising though if two different sites disagree on which slugs go with which profiles.

On Thursday, September 8, 2011 at 10:28 AM, RobertGardner wrote:

I have a question. The spec calls for a url parameter that identifies the item. It would be really nice to have a way of determining that two references on two different sites refer to the same person. Do we need to incorporate some type of unique ID or canonical URL into this spec?

Reply to this email directly or view it on GitHub: https://github.com/historical-data/schema/issues/16

stoicflame commented 12 years ago

Okay, but you're talking about the definition of some kind of a standard spec for identifying persons, events, documents, etc. In other words, you're talking about the definition of what the URI looks like.

Why do search engines care what the URI looks like? Whether it looks like "urn:12345" or "sha:fjvub75hth" or "http://geni.com/whatever" why does the search engine care?

So isn't that outside the scope of this project?

NatAtGeni commented 12 years ago

Agree - this is outside the scope of this project.

RobertGardner commented 12 years ago

Well, search engines could care because if I search for "http://geni.com/person/12345" I could find all kinds of documents that reference this person. Isn't that really the ultimate goal of genealogical searching -- finding all information relevant to a particular person?

Just to add interest, suppose there were a genealogy search site that knew how to map from a search for "Robert Gardner" to "http://geni.com/person/12345", then it could issue this more obscure search for me. Suddenly I'd be able to find documents, pages, sites, etc. that all relate to that person.

Suppose a site specialized in journals. It could use the canonical URL to identify the person who wrote the journal and people mentioned in the journal. Searching for that URL would then produce a wealth of valuable information.

OK, having said that, I don't want that effort to derail this project. For now we just need to leave it at "url" but remember to come back at some point and figure out WHAT needs to go into that url -- and add it to the spec!

ninjudd commented 12 years ago

Agreed. Let's come back to this later. Phase 2...

On Thursday, September 8, 2011 at 11:28 AM, RobertGardner wrote:

Well, search engines could care because if I search for "http://geni.com/person/12345" I could find all kinds of documents that reference this person. Isn't that really the ultimate goal of genealogical searching -- finding all information relevant to a particular person?

Just to add interest, suppose there were a genealogy search site that knew how to map from a search for "Robert Gardner" to "http://geni.com/person/12345", then it could issue this more obscure search for me. Suddenly I'd be able to find documents, pages, sites, etc. that all relate to that person.

Suppose a site specialized in journals. It could use the canonical URL to identify the person who wrote the journal and people mentioned in the journal. Searching for that URL would then produce a wealth of valuable information.

OK, having said that, I don't want that effort to derail this project. For now we just need to leave it at "url" but remember to come back at some point and figure out WHAT needs to go into that url -- and add it to the spec!

Reply to this email directly or view it on GitHub: https://github.com/historical-data/schema/issues/16#issuecomment-2042974

stoicflame commented 12 years ago

Closing this issue for now.

ninjudd commented 12 years ago

Moving to milestone "future".