omeka / omeka-s

Omeka S is a web publication system for universities, galleries, libraries, archives, and museums. It consists of a local network of independently curated exhibits sharing a collaboratively built pool of items, media, and their metadata.
GNU General Public License v3.0
401 stars 134 forks source link

Remove FOAF #1030

Closed jimsafley closed 7 years ago

jimsafley commented 7 years ago

Although FOAF is a de facto standard for describing relationships between people, it's not well conceived and a bit confusing for people who are not already familiar with it. We should remove it from new installations and note its existence in some "recommended vocabularies" list in our docs.

zerocrates commented 7 years ago

some "recommended vocabularies" list in our docs.

I was going to mention the same thing, or popular ones anyway. Especially as some of them can actually be fairly difficult to find the actual RDF document.

patrickmj commented 7 years ago

I'm a little less convinced. Since it is a de facto standard -- even with its flaws -- it seems like it should still be included. It's worth research, though, into how widely it is used in practice.

123neil commented 7 years ago

We use it Patrick for the 3 Omeka S projects we have running, but then we import a lot of other Vocabs. But then we can also import them as part of the resource provider module if you do go ahead and remove it from installations. To give you an indication of who is using it. Speak to you soon. Neil

patrickmj commented 7 years ago

@123neil Good to know. What are the other vocabs you import? Would any of them be general enough that we should consider for the default installation?

jimsafley commented 7 years ago

Although it's a de facto standard, it's one because of longevity, not quality. If we are to ship with a vocabulary that describes entities, our audience would be better served by the vCard ontology.

patrickmj commented 7 years ago

Looks like FOAF is also embedded in other vocabs, e.g. DOAP.

ewg118 commented 7 years ago

foaf:Organization and foaf:Person are very widely used classes, along with some others. The other properties in foaf aren't necessarily used widely within cultural heritage. foaf is good enough for describing the attributes of a person, but not relationships between people. There is still no agreed upon standard for relationships between people/organizations. The Records in Contexts conceptual reference model (from the archival domain) has a lot of relationship properties--probably too many--and the ontology hasn't been formally published yet.

The W3C org ontology is very good at describing the relationships between people and organizations (which would include families), and the role a person or organization plays within a broader entity: https://www.w3.org/TR/vocab-org/

The biographical ontology (http://purl.org/vocab/bio/0.1/) has a lot of useful relationship properties.

Instead of creating RDF properties for each individual type of relationship, I think it is better to use one property (e.g., bio:relationship), and then express the nature of the property with an instance, e.g., myvocab:sonOf. Managing relationships as instances in a flexible, extensible vocabulary is much easier in the long-term than constantly extending your ontology for new properties and classes.

edit: Also, note that the org ontology uses foaf classes for entities.

mialondon commented 7 years ago

The BL used FOAF when modelling data for the BBC RES project e.g. http://museum-api.pbworks.com/w/page/111413185/RDF%20Definition%20-%20Person

patrickmj commented 7 years ago

Thanks, Ethan and Mia. Helpful info!

Following my nose through links, it also looks like it's still deeply present in DBPedia. There's also Tom Elliot's response on twitter.

I'm starting to lean more toward keeping FOAF in, warts and all, since it is something that people will see elsewhere, especially when learning about common LODLAM datasets.

jimsafley commented 7 years ago

I'm pleased this issue garnered so much attention. For the sake of clarity, the issue is only about shipping Omeka S without FOAF. It's simple to import vocabularies once it's installed.

A related topic is which vocabularies are best suited for general use out-of-the-box. I maintain that FOAF is not a good fit for general use because of its outdated and inconsistent members. Above I mentioned the vCard ontology because it contains members suitable to a rich description of individuals and organizations. The ORG and BIO ontologies that @ewg118 mentions will be essential to many projects; but I think vCard has more general applicability. From the spec doc:

The vCard specifications have a long history and were first proposed in 1995 and then standardized by the IETF in 1998. Since then, new vocabularies, such as the FOAF Vocabulary Specification (2005), and the The Organisation Ontology (2013) have appeared. The vCard Ontology has also focused on describing people and organisations, including location information and groups of such entities. The FOAF ontology focuses more on the relationships between people, agents, things and social web entities, and the ORG ontology focuses on organizational structures, roles, and activities. There are some overlaps between the three ontologies, but they can provide useful vocabularies individually, and also can provided enhanced information when used collaboratively.

The Omeka team only wants to maintain vocabularies that have the widest application to the most projects, which is what inspired this issue. In my opinion, the focus on FOAF has overshadowed higher-quality vocabularies. I still would like to remove it, and perhaps replace it with vCard.

zerocrates commented 7 years ago

So my position is this: I'm more or less in line with Jim on FOAF itself, though I wouldn't replace it with anything. Absent FOAF, the remaining preinstalled vocabularies have a pretty solid "bibliographic" focus to the extent they have any at all, which seems appropriate to me as a default.

In some way, having even a common (but less than ubiquitous) vocabulary like FOAF excluded from the default set can even be spun as a benefit of sorts, in that it raises exposure to S's vocabulary import, which is a major departure from Classic as far as ease of using whatever vocabularies are appropriate for a particular use case.

Additionally, it seems to me that with it not installed by default, we have the possibility of smoothing over some of FOAF's issues by adding some features to the importer: for selecting what classes of properties will be loaded (FOAF has membershipClass as an AnnotationProperty for arcane reasons of one sort or another), or possibly supporting the term_status property to leave out "archaic" properties if the user wants to (though it doesn't necessarily seem to be that this is supported by enough vocabularies to make implementing it worthwhile).

My basic view is that it's not a particularly big deal to drop something from the default set of installed vocabularies simply because it's a much smaller hill to climb in S to just get the specific vocabulary you want.

jimsafley commented 7 years ago

Removing FOAF and not replacing it with something else means that S will no longer come with a way to identify a resource as a person or organization. FOAF has foaf:Organization and foaf:Person, vCard has vcard:Individual and vcard:Organization. DC terms, DC types, and BIBO have no such classes. This omission from the shipped vocabularies is a difficult thing for me to accept.

patrickmj commented 7 years ago

If it's a question of vCard vs FOAF, from everything I've seen so far FOAF has the wider usage and familiarity. I think it would raise eyebrows to ship with vCard instead of FOAF.

zerocrates commented 7 years ago

I'm not particularly convinced that enough people will describe people as Items to mandate having FOAF or a FOAF-alike for all installs by default, but if that's the sense of everybody else then so be it. Dublin Core has Agent but it's quite quite general.

ewg118 commented 7 years ago

In an open world semantic web system, ontologies and metadata application profiles have appropriated foaf:Person and foaf:Organization rather than minting their own class. I have never seen anyone use vcard:Individual in any dataset I've ever seen in cultural heritage, but foaf:Person is ubiquitous. DC terms and the Europeana Data Model both have Agent classes. The EDM Agent is equivalent to crm:E39_Actor, but the DCTerms ontology doesn't link dcterms:Agent to anything equivalent (e.g., foaf:Agent), which it probably should for greater interoperability.

So frankly, if you want to be able to identify people and organization entities in Omeka-S, you should really use foaf:Person and foaf:Organization since these are by far the most commonly used classes for these designations.

Alternatively, if you want to go into a strictly CH domain, you can use CIDOC-CRM. However, CIDOC-CRM's ontology is strictly closed-world with respect to domains and ranges. The CRM ontologists are purists and frown upon the mixing and matching of properties and classes between CRM and other ontologies (which is actually a bad way to design a LOD information system).

In the biographical ontology linked above, note that this ontology is really an event ontology, not an ontology for strictly biographical information (e.g., there's no property for the name and no class for the person). The agent property (used within the context of an Event) is intended to link to an instance of foaf:Agent.

jimsafley commented 7 years ago

My only point in recommending vCard over FOAF is to encourage better description of entities within Omeka S. FOAF has the advantage of being widely used but I'm not convinced that this one fact makes it the best vocabulary to include out-of-the-box. I'm not making a value judgement of FOAF or vCard other than the latter is much more descriptive and consistent than the former.

Omeka's goal is to be good stewards of the semantic web. Whether that goal is best met by including FOAF, a related ontology, or nothing is what we're debating now. Personally, I think this is an opportunity to bring other ontologies into the cultural heritage fold based on their individual merits, not on legacy.

That said, I appreciate the debate and, quite frankly, am okay with whatever decision is most comfortable to our users.

ewg118 commented 7 years ago

Both of the ontologies are really geared toward describing currently living people/organizations and their (usually) online personas or physical addresses. Perhaps this is relevant for some Omeka use cases, but I suspect most users are interested in capturing more historical entities. I think foaf:Organization and foaf:Person are more common than vcard classes, but there's no reason you have to ship a person/organization entity profile composed exclusively with foaf or vcard properties.

The thing is, every prosopographical research project or collection database (that links to people as creators, contributors, etc.) has different requirements and a need to capture different sorts of information. I would make a list of the most basic things you think people want to capture about an entity, and then create a profile that meets these requirements. It may include foaf classes for entity type, foaf:name, skos:prefLabel, bio properties for birth, death dates, and so forth. I am not sure that foaf properties or vcard properties will work exclusively for the broadest possible use cases. Keep in mind that there's also MADS (http://www.loc.gov/standards/mads/rdf/).

Our RDF for entities (derived from EAC-CPF) contains only the most basic properties for names, matching concepts in other LOD systems, and relationships to facilitate social network graph visualization. It's not perfect or comprehensive, but it works for our own visualizations: http://numismatics.org/authority/newell.rdf

ewg118 commented 7 years ago

Oh, also I recommend having a look at http://snapdrgn.net/

patrickmj commented 7 years ago

I suspect that we don't want to go the route of trying to create something new based on our best guesses about what we think people will want to capture -- I think we're still trying to stay more general than that. And, when people ask us about more specialized descriptions, we'll give the same guidance and tell them to fire up Protege and go to town!

jimsafley commented 7 years ago

@ewg118

We appreciate your insights here.

I would make a list of the most basic things you think people want to capture about an entity, and then create a profile that meets these requirements.

It's a fantastic idea. In Omeka S users can create "resource templates" that define the properties necessary to adequately describe their resources. They draw these properties from the imported vocabularies.

This is why we're currently discussing which vocabularies should ship with S: which contain the most interesting, generally applicable members from which users can create resource templates out-of-the-box? We want to provide enough to promote good metadata practice but we certainly don't want to presume to know the requirements of each of our users. They can import whatever vocabularies they require.

Daniel-KM commented 7 years ago

For my part, I prefer to maintain it. Two other possibilities:

patrickmj commented 7 years ago

Trying to pull a few different threads together -- a similar idea was a module that would provide a list of vocabs for import (based on the idea of having documentation that lists vocabs that might be useful, but just providing a checkbox to install -- FedoraConnector already does a similar thing).

I'd like to hear how big a new feature it would be to pay attention to the property status @Daniel-KM and @zerocrates suggest (tentatively) during in the import process. I hesitate to add big new features at this point.

My reading of non-CHNM input is that we should keep FOAF as a default vocab on installation. I lean that direction.

zerocrates commented 7 years ago

I think the main issue with term status is that nothing really reliably uses it. Even FOAF lists practically everything in "testing," and many/most stuff just doesn't say anything. In FOAF skipping "archaic" does paper over some of the problems, though. In its specific situation if we really wanted to do it it'd probably be as easy to just strip them out of the file, frankly.

As always though, I'm concerned with any of this about possible interoperability issues with excluding properties like that. It's more feasible as a installation-to-installation setting, but for something we do by default it's a little tougher. Choosing the vocabularies in the installer is an interesting possibility, though not really much different from the current ability users have to delete vocabularies.

I wouldn't be too broken up by making no change here, if that's the general feeling.

Daniel-KM commented 7 years ago

Ok, forgot about the term status, but the choice of a list of recommended vocabularies during install is easier to implement.