historical-data / schema

Microdata schema for historical data.
historical-data.org
30 stars 4 forks source link

clarify document references model #12

Closed stoicflame closed 12 years ago

stoicflame commented 12 years ago

This pull request renames historical document 'mentioned' to 'references' and expands its range to 'Thing' to clarify the meaning of the property. It also removes 'events' from historical document to eliminate the redundancy.

NatAtGeni commented 12 years ago

To me 'references' refers to sources - I think the name is misleading.

Also I can't find another example where the super-generic Thing is used as an expected type for any of the properties. I'm not even sure how that would be marked up - I think you'd have to nest a Person or Event inside the Thing, which seems unnecessarily complex.

ninjudd commented 12 years ago

I think you can just include a HistoricalPerson or HistoricalEvent directly since they inherit from Thing. That said, having separate fields for each type may be easier for clients. E.g. people, events

RobertGardner commented 12 years ago

Keep in mind that search engines aren't going to parse this out, they're going to just look at the words in the markup. The more explicit words you have the more likely the data will match a search. So I prefer separate fields. This isn't as important in this case since "people" and "events" aren't likely to appear in searches, but in general, I would err on the side of redundancy and denormalization.

stoicflame commented 12 years ago

Keep in mind that search engines aren't going to parse this out, they're going to just look at the words in the markup.

Can you please clarify this? Cases like this are exactly what the itemtype property was defined for. Are you saying that search engines aren't going to regard that?

To me, it seems reasonable that a document might refer to Things other than HistoricalPerson or HistoricalEvent. Why don't we just keep it open for extensibility and flexibility? Wouldn't that follow the same kind of pattern that we've been establishing in other threads like #7 and #8? What makes this case different?

ninjudd commented 12 years ago

I'm okay with having a general field for references to Thing objects, but I still think separate fields for people and events are desirable and more convenient than scanning through references.

NatAtGeni commented 12 years ago

Thing only has four properties: description, image, name and url. You can't add the properties for a Person or an Event to a Thing because they're not valid there. You have to declare the itemtype you're actually going to use. I think we definitely need to keep people and events. Then if other items need to be referenced, we can find a way to handle adding additional things.

stoicflame commented 12 years ago

So maybe a concrete example to make sure we're all on the same page. Here's kinda how a document would be marked up using a generic 'references' property:

<div itemscope itemtype="http://historical-data.org/HistoricalDocument">
  <h1 itemprop="name">Biography of William Heaton</h1>
  ...
  <div itemprop="references">
    <div itemscope itemtype="http://historical-data.org/HistoricalPerson">
      <!-- markup for the person -->
    </div>
    <div itemscope itemtype="http://historical-data.org/HistoricalEvent">
      <!-- markup for the event -->
    </div>
    <div itemscope itemtype="http://schema.org/Person">
      <!-- markup for some person that is a current, living descendant of William mentioned in the biography -->
    </div>
  <div>
</div>

I'm not seeing anything confusing, awkward, or inconvenient here. What am I missing?

NatAtGeni commented 12 years ago

I think that you need to change your Expected Type to HistoricalDocument, HistoricalPerson, or HistoricalEvent. That's not the same as saying the Expected Type is Thing.

If you say expected type is Thing that's the itemtype that's going to be used there.

ninjudd commented 12 years ago

I'm pretty sure it means you can use anything that inherits from Thing

On Sep 8, 2011, at 11:10 AM, NatAtGenireply@reply.github.com wrote:

I think that you need to change your Expected Type to HistoricalDocument, HistoricalPerson, or HistoricalEvent. That's not the same as saying the Expected Type is Thing.

If you say expected type is Thing that's the itemtype that's going to be used there.

Reply to this email directly or view it on GitHub: https://github.com/historical-data/schema/pull/12#issuecomment-2042769

NatAtGeni commented 12 years ago

I understand that's what's meant, but it can't be listed in the spec as just Thing. At the very least we should change Expected Type to say 'Any itemtype that inherits from Thing' to clarify. Otherwise, people are going to be using itemtype Thing always, thinking that's what's expected there.

ninjudd commented 12 years ago

What makes you think it works the way you describe? That's not how inheritance works in most other contexts.

On Sep 8, 2011, at 11:18 AM, NatAtGenireply@reply.github.com wrote:

I understand that's what's meant, but it can't be listed in the spec as just Thing. At the very least we should change Expected Type to say 'Any itemtype that inherits from Thing' to clarify. Otherwise, people are going to be using itemtype Thing always, thinking that's what's expected there.

Reply to this email directly or view it on GitHub: https://github.com/historical-data/schema/pull/12#issuecomment-2042855

RobertGardner commented 12 years ago

(How do you reference other comments?)

I think the point was that people will think they have to put Thing there, not realizing they can put anything that inherits from Thing. That's a reasonable point. Also note that if you put Thing as the expected type, anything can go there -- are we really able to handle anything? Say, a phone number?

I think the real issue is that it needs to be documented well so that it's clear what types are legal.

ninjudd commented 12 years ago

I'm still in favor of separate fields for people and events with a more generic references for Thing objects.

The example above would then be:

<div itemscope itemtype="http://historical-data.org/HistoricalDocument">
  <h1 itemprop="name">Biography of William Heaton</h1>
  ...
  <div itemprop="people">
    <div itemscope>
      <!-- markup for the person -->
    </div>
  </div>
  <div itemprop="events">
    <div itemscope>
      <!-- markup for the event -->
    </div>
  </div>
  <div itemprop="references">
    <div itemscope itemtype="http://schema.org/Person">
      <!-- markup for some person that is a current, living descendant of William mentioned in the biography -->
    </div>
  <div>
</div>
RobertGardner commented 12 years ago

I wanted to come back and comment on this:

Keep in mind that search engines aren't going to parse this out, they're going to just look at the words in the markup.

Can you please clarify this? Cases like this are exactly what the itemtype property was defined for. Are you saying that search engines aren't going to regard that?

I'm still looking into how Google's index will handle these. The point was, though, that if we have something buried in an event, it's going to take special effort for the search engines to extract it out of there. If someone searched for birth:1800, it's much easier for that to succeed if there is a property named 'birth' than if the birth date is buried inside an event field with the name 'birth'.

Clearly, a search engine that is genealogy-specific could do that parsing and digging and make it work. Generic search engines would most likely just index the words in the tags. It's unclear how much effort search companies are going to put into special handling of genealogy searches, so I'd like to make this successful for generic search indexers.

I think this comment is moot now, since we've agreed to keep canonical fields.

stoicflame commented 12 years ago

Please review the latest commit to this pull request. references has been split out to persons, events, and families to make the properties more specific per @ninjudd's request.

stoicflame commented 12 years ago

(How do you reference other comments?)

Just thought I'd respond to this. The text we're writing is parsed as Markdown, canonically defined at http://daringfireball.net/projects/markdown. So to quote something, you start it with a '>' character. (I just copy the comments I want to quote and paste it into the text box after a '>' character.

stoicflame commented 12 years ago

What does everybody think about the latest changes? Can I apply this?

ninjudd commented 12 years ago

Looks good to me.