hbz / lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD
http://lobid.org/resources
Eclipse Public License 2.0
7 stars 7 forks source link

How will we implement person search with API 2.0? #10

Closed acka47 closed 8 years ago

acka47 commented 8 years ago

I just realized that we will have to query ~50 fields connected with OR for a person search with API 2.0. Here is part of the query needed to search for a person named "Schmidt":

?q=contributor.preferredNameForThePerson:Schmidt OR contributor.alternateNameForThePerson:Schmidt OR creator.preferredNameForThePerson:Schmidt OR creator.alternateNameForThePerson:Schmidt OR director.preferredNameForThePerson:Schmidt OR director.alternateNameForThePerson:Schmidt OR singer.preferredNameForThePerson:Schmidt OR singer.altenrateNameForThePerson:Schmidt OR actor.preferredNameForThePerson:Schmidt actor.alternateNameForThePerson:Schmidt OR

As we have creator and contributor and 24 MARC relators in the context and all of them can include the preferredName and variantName field, we have a total of 52 fields to be queried. I think this might be problematic when querying by URL.

I believe there will be different approaches to this. I opened this to discuss the possible solutions and chose one and to hopefully learn some details about the planned functioning of API 2.0.

fsteeg commented 8 years ago

To find anything relating to a person we could search for preferredNameForThePerson:Schmidt OR alternateNameForThePerson:Schmidt. But I guess what you mean is finding anything created by a person, right? If we need that level of abstaction we should put it into the data. If we're constrained to this structure, couldn't we define some kind of super property for all these specific roles and query against that?

But what I'd imagine as a JSON structure for API 2.0 is something like this:

creator: {
  type: "Person",
  role: "Author",
  name: {
    preferred: "Heinrich Heine",
    variants: ["Chajne, Chajnrich", "Hāine, Hāinarīśa"]
  }
}

The query would then simply be creator.name: Schmidt. The specific cases of querying specific roles would look like this: creator.name: Schmidt AND creator.role: Director.

acka47 commented 8 years ago

Your proposal looks interesting and I like it in general. That would mean some significant changes to our data model, though, which means some adjustments to the metafacture morph. (@dr0i might have an estimate for this.) We would have to differ roles by type or similar instead of differing by the property.

Generally, it is the case that MARC relator codes are also SKOS concepts (see e.g. http://id.loc.gov/vocabulary/relators/cre.html). For indication the relator one could use the bf:relator property (I didn't find another).

Thus, we could make use of this and build our data like this:

{
"@context": {
  "type": "@type",
  "role": {
    "@type": "@id",
    "@id": "http://bibframe.org/vocab/relator"
    },
  "preferredName": "http://d-nb.info/standards/elementset/gnd#preferredName",
  "variantName": "http://d-nb.info/standards/elementset/gnd#variantName",
  "DifferentiatedPerson": "http://d-nb.info/standards/elementset/gnd#DifferentiatedPerson",
  "Creator": "http://id.loc.gov/vocabulary/relators/cre"
  },
"contributor": {
  "@id": "http://d-nb.info/gnd/118548018"
  "type": "DifferentiatedPerson",
  "role": "Creator",
  "preferredName": "Heinrich Heine",
  "variantName": [ "Chajne, Chajnrich", "Hāine, Hāinarīśa" ]
  }
}

I don't think we should mess with the name structure, though. Although GND also models person names as entities with preferredNameEntityForThePerson and variantNameEntityForThePerson (see example in turtle notation below), it doesn't do so for other authority resource types (e.g. ConferenceOrEvent). Thus, using the approach for names above would mean diverging from the GND modeling and to roll our own which I would like to prevent.

<http://d-nb.info/gnd/118548018>
    gndo:preferredNameEntityForThePerson [
        gndo:forename "Heinrich"^^<http://www.w3.org/2001/XMLSchema#string> ;
        gndo:surname "Heine"^^<http://www.w3.org/2001/XMLSchema#string>
    ],
      gndo:variantNameEntityForThePerson [
        gndo:forename "Heinrikh"^^<http://www.w3.org/2001/XMLSchema#string> ;
        gndo:surname "Heine"^^<http://www.w3.org/2001/XMLSchema#string>
    ], [
        gndo:forename "Heinrikh"^^<http://www.w3.org/2001/XMLSchema#string> ;
        gndo:surname "Hāine"^^<http://www.w3.org/2001/XMLSchema#string>
    ] .
dr0i commented 8 years ago

Should be realizable in a reasonable time.

fsteeg commented 8 years ago

My hope is that we'd be able to have nice JSON independent of what we output as RDF.

I have played around a bit in the playground and if we give the name element the same ID as its parent:

"name": {
  "preferred": "Heinrich Heine",
  "variants": ["Chajne, Chajnrich", "Hāine, Hāinarīśa"],
  "id": "http://d-nb.info/gnd/118548019"
}

And add a mapping for the name field:

"name": "https://schema.org/sameAs"

The nested properties show up as triples about the parent:

<http://d-nb.info/gnd/118548018> <http://d-nb.info/standards/elementset/gnd#preferredName> "Heinrich Heine" .
<http://d-nb.info/gnd/118548018> <http://d-nb.info/standards/elementset/gnd#variantName> "Chajne, Chajnrich" .
<http://d-nb.info/gnd/118548018> <http://d-nb.info/standards/elementset/gnd#variantName> "Hāine, Hāinarīśa" .
<http://d-nb.info/gnd/118548018> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://d-nb.info/standards/elementset/gnd#DifferentiatedPerson> .
<http://d-nb.info/gnd/118548018> <https://schema.org/sameAs> <http://d-nb.info/gnd/118548018> .
_:c14n0 <http://purl.org/dc/terms/contributor> <http://d-nb.info/gnd/118548018> .

The complete input (playground link):

{
 "@context": {
  "name": "https://schema.org/sameAs",
  "preferred": "http://d-nb.info/standards/elementset/gnd#preferredName",
  "variants": "http://d-nb.info/standards/elementset/gnd#variantName",
  "DifferentiatedPerson": "http://d-nb.info/standards/elementset/gnd#DifferentiatedPerson",
  "contributor": "http://purl.org/dc/terms/contributor"
 },
 "contributor": {
  "@id": "http://d-nb.info/gnd/118548018",
  "@type": "DifferentiatedPerson",
  "name": {
    "preferred": "Heinrich Heine",
    "variants": ["Chajne, Chajnrich", "Hāine, Hāinarīśa"],
    "@id": "http://d-nb.info/gnd/118548018"
  }
 }
}

This would allow the shorter contributor.name.\*: Schmidt queries including sub fields.

It also adds a layer of semantics to the data, since it groups the preferred and variant names under something queryable by clients. This allows the clients to be smarter and yet more robust. Consider a query like http://beta.lobid.org/organisations/search?q=location.\*:Münster: we can formulate the query on a high abstraction level ('the location should contain Münster'), without regard to the specific structure underneath. At the same time, we can still access the specific fields if that's what we want.

acka47 commented 8 years ago

I don't support something like aliasing sameAs with "name" in the context. This will definitely confuse people who care about Linked Data.

Remembering @jschnasse's request for a uniform way of providing prefLabels and alternate labels in https://github.com/hbz/lobid/issues/1#issuecomment-143740221 I had another idea, though, that might look much the same and would't confuse people. We could for all embedded objects with @id or blank node use this pattern:

{
 "@context": {
  "label": "http://purl.org/lobid/lv#label",
  "prefLabel": "http://www.w3.org/2004/02/skos/core#prefLabel",
  "altLabel": "http://www.w3.org/2004/02/skos/core#altLabel",
  "DifferentiatedPerson": "http://d-nb.info/standards/elementset/gnd#DifferentiatedPerson",
  "contributor": "http://purl.org/dc/terms/contributor"
 },
 "contributor": {
  "@id": "http://d-nb.info/gnd/118548018",
  "@type": "DifferentiatedPerson",
  "label": {
    "prefLabel": "Heinrich Heine",
    "altLabel": ["Chajne, Chajnrich", "Hāine, Hāinarīśa"],
    "@id": "http://d-nb.info/gnd/118548018"
  }
 }
}

The same pattern would be used for subjects:

{ 
 "@context": {
  "label": "http://purl.org/lobid/lv#label",
  "prefLabel": "http://www.w3.org/2004/02/skos/core#prefLabel",
  "altLabel": "http://www.w3.org/2004/02/skos/core#altLabel",
  "DifferentiatedPerson": "http://d-nb.info/standards/elementset/gnd#DifferentiatedPerson",
  "SubjectHeading": "http://d-nb.info/standards/elementset/gnd#SubjectHeading",
  "PlaceOrGeographicName": "http://d-nb.info/standards/elementset/gnd#PlaceOrGeographicName",
  "role": {
    "@type": "@id",
    "@id": "http://bibframe.org/vocab/relator"
    },
  "subject": "http://purl.org/dc/terms/subject",
  "Creator": "http://id.loc.gov/vocabulary/relators/cre"
 },
 "@id" : "http://lobid.org/resource/HT018843259",
 "contributor": [ {
  "@id": "http://d-nb.info/gnd/118548018",
  "@type": "DifferentiatedPerson",
  "role": "Creator",
  "label": {
    "prefLabel": "Becker, Thomas Paul",
    "altLabel": ["Becker, Thomas P." ],
    "@id": "http://d-nb.info/gnd/171969979"
  }
 } ],
 "subject": [ {
   "@id" : "http://d-nb.info/gnd/4031485-6",
   "@type": "PlaceOrGeographicName",
    "label": {
      "prefLabel": "Erzstift Köln",
      "altLabel": ["Kölner Krieg", "Truchsessischer Krieg" ]
    }
  }, {
    "@id" : "http://d-nb.info/gnd/4164368-9",
    "@type": "SubjectHeading",
    "label": {
      "prefLabel": "Kölnischer Krieg",
      "altLabel": ["Kurköln", "Köln (Hochstift)", "..." ]
    }
  }
 ]
}
fsteeg commented 8 years ago

The label / prefLabel / altLabel suggestion looks very good!

jschnasse commented 8 years ago

Why is the additional label level? Why is contributor.label.*: Schmidt better than contributor.*:Schmidt ?

fsteeg commented 8 years ago

What I like about it is that it makes the semantic relation between prefLabel and altLabel explicit. It provides an additional level of granularity, since queries against e.g. contributor.\* would also include the type and the role. But you're right, perhaps the the level above is enough.

acka47 commented 8 years ago

Good point, @jschnasse. So we can leave out this level of indirection?

dr0i commented 8 years ago

These statements:

<http://d-nb.info/gnd/118548018> <https://schema.org/sameAs> <http://d-nb.info/gnd/118548018> .
_:c14n0 <http://purl.org/dc/terms/contributor> <http://d-nb.info/gnd/118548018> .

don't mean anything (ok, at least the first one doesn't). Must be a bug in playground's library. Wait:

This will definitely confuse people who care about Linked Data.

sic.

acka47 commented 8 years ago

This discussion is relevant for implementing https://github.com/hbz/lobid-rdf-to-json/issues/24.

fsteeg commented 8 years ago

OK, so I think the conclusion is that we want something like this:

"contributor": [ {
  "id": "http://d-nb.info/gnd/118548018",
  "type": "DifferentiatedPerson",
  "role": "Creator",
  "label": "Becker, Thomas Paul",
  "altLabel": [ "Becker, Thomas P." ]
} ],
"subject": [ {
   "id" : "http://d-nb.info/gnd/4031485-6",
   "type": "PlaceOrGeographicName",
   "label": "Erzstift Köln",
   "altLabel": [ "Kölner Krieg", "Truchsessischer Krieg" ]
} ]

Correct, @acka47? Assigning @dr0i, tagging ready & nwbib-launch (required for https://github.com/hbz/nwbib/issues/262)

acka47 commented 8 years ago

I would have created another ticket for this as this was about discussing a question. We may as well keep this one but then at least have to give it another title...

acka47 commented 8 years ago

Closing as a decision has been found. https://github.com/hbz/lobid-rdf-to-json/issues/24 and #38 are the issues for implementing.