samvera / hydra-works

A ruby gem implementation of the PCDM Works domain model based on the Samvera software stack
Other
24 stars 14 forks source link

Provide a definition of scope of "Work"? #8

Closed azaroth42 closed 9 years ago

azaroth42 commented 9 years ago

My understanding is that it is a compound or complex object (eg a resource that has parts, which may themselves have parts). It is not the bibliographic notion of an abstract Work (as opposed to a physical Item that embodies the Work).

It would be good to come to a common understanding of the definition and thus scope of the effort before starting in on modeling.

jeremyf commented 9 years ago

As implemented in Curate, a Work is a bucket for predicates and has_many related files. In the case of Curate, I believe the predicate objects are either very simple (i.e. Literals) or are pointers to other objects.

jcoyne commented 9 years ago

@azaroth42 I'm not entirely sure what you're getting at. I think Work in the Hydra::Work sense is a "bibliographic notion", and it uses a "complex object" to model this notion.

azaroth42 commented 9 years ago

A Work could never have a representation as it's only ever conceptual. Instead, it would have an Expression, which would have a Manifestation which could have a representation.

For example The Lord of the Rings, could have an expression of Text by Tolkien, which could have a manifestation of the first edition, which then has a digital and/or physical representation. Equally, the expression could be the Movies as directed by Peter Jackson, which would have very different metadata and representations, but are still expressions of the notion "The Lord of the Rings". Hydra's "Work" collapses those more abstract levels together (right?)

mjgiarlo commented 9 years ago

I do think that Hydra Works are more like Manifestations (in FRBR speak) here. When I think of works in this context, I mean an "intellectual object." Which, admittedly, may be a similar ill-defined or overloaded phrase. ;) Sounds like we're on the same page, though?

mjgiarlo commented 9 years ago

Another scope question: are we looking for a works model to underlie all of Hydra, or for one to serve as a common base for roughly IR-like Hydra applications?

They are both interesting domains but I ask because I thought the original context for this from Connect was a works model specifically to help bring Sufia and Worthwhile together (the latter), at least as a first step. If that is the case, I wonder whether some of the use cases might be a better fit for a later phase of this work.

-------- Original message --------
From: Rob Sanderson notifications@github.com
Date:10/08/2014 20:16 (GMT-08:00)
To: projecthydra-labs/hydra-works hydra-works@noreply.github.com
Subject: Re: [hydra-works] Provide a definition of scope of "Work"? (#8)
A Work could never have a representation as it's only ever conceptual. Instead, it would have an Expression, which would have a Manifestation which could have a representation. For example The Lord of the Rings, could have an expression of Text by Tolkien, which could have a manifestation of the first edition, which then has a digital and/or physical representation. Equally, the expression could be the Movies as directed by Peter Jackson, which would have very different metadata and representations, but are still expressions of the notion "The Lord of the Rings". Hydra's "Work" collapses those more abstract levels together (right?) — Reply to this email directly or view it on GitHub.
jcoyne commented 9 years ago

+1 to @mjgiarlo's idea of constraining this to something simple in the first phase. I'm not sure that something highly structured like a "Book - Pages - Representations" is going to match up with a general purpose self-deposit UI.

jpstroop commented 9 years ago

If I thought hydra:Work had anything to do with frbr:Work, I wouldn't have joined the conversation. If FRBR semantics are something you care about, they can be covered in your descriptive metadata, no?

Which brings me to a point I was looking for a place to raise. As evidenced by this thread, the term "Work" carries a lot of baggage. If we're shooting for a higher level of abstraction, I wonder if there's a better term/label for what we're trying to conceptualize.

My proposal: GenericThing. I'm going to jump over to #9 and talk about it.

jcoyne commented 9 years ago

:-1: to GenericThing, that's too broad, akin to Object.

escowles commented 9 years ago

:-1: to GenericThing, if we're going to use a very broad term, I think Object/DigitalObject/CulturalHeritageObject would be better.

jpstroop commented 9 years ago

I could get on board with DigitalObject.

dchandekstark commented 9 years ago

+1 to limiting scope. It's hard to imagine taking on a fully generic Hydra model and expect to accomplish anything in the near term.

On Oct 9, 2014, at 8:53 AM, Justin Coyne notifications@github.com<mailto:notifications@github.com> wrote:

+1 to @mjgiarlohttps://github.com/mjgiarlo's idea of constraining this to something simple in the first phase. I'm not sure that something highly structured like a "Book - Pages - Representations" is going to match up with a general purpose self-deposit UI.

— Reply to this email directly or view it on GitHubhttps://github.com/projecthydra-labs/hydra-works/issues/8#issuecomment-58504490.

jeremyf commented 9 years ago

Some names that we kicked around for "Work" were: ScholarlyConcern, CurationConcern, ThoughtBucket, Entity, Container, WorkNode, and ScholarlyWork.

A thing to consider is who is the audience of this repository? As a programmer, a Work does not bring any baggage, and is a concept that is easy to digest.

We have powerful internationalization tools for translating the meaningful object names that programmers use in the system to meaningful names used by consumers of the service.

escowles commented 9 years ago

I think it's important to focus on programmers as the primary users of these names. Our catalogers, end users, administrators, etc. are mostly going to use a UI that labels them however they tell us to label them. It's mostly the developers who are going to be working with the class names. So I think "Work" is less overloaded in this context. That said, I still think "DigitalObject" is preferable.

jeremyf commented 9 years ago

@escowles I'm mapping DigitalObject onto the Curate/Worthwhile model.

A Curate::Work has descriptive metadata and has many (in the RDBMS meaning) Curate::GenericFiles. A Curate::GenericFile has as one of its datastreams an attachment (i.e. the File) and can have descriptive metadata. Both a Curate::Work and Curate::GenericFile have unique PIDs.

So DigitalObject becomes non-descriptive given the above example. Which I believe teases out a question: To what "object" are we attaching files?

escowles commented 9 years ago

@jeremyf I think mapping is:

Files can be attached either directly to a DigitalObject (when they all represent the same thing, e.g. source file and derivatives), or to a Component (where there are multiple source files).

awead commented 9 years ago

I hate to muddy the waters, but I'm :-1: on DigitalObject, although I won't die on a hill to prevent it. It's too overused, I think, and some may align DigitalObjects at the file level and not the grouping level. But, @escowles notion of audience is %100 on target. We're talking to programmers, not end users.

So what's wrong with work? FRBR-speaking, it's an intellectual unit, a grouping, a set ? I know we want to avoid any FRBR stuff, but are we just arguing semantics here?

Not trying to start any wars, just want to hone in on the exact issue :v:

azaroth42 commented 9 years ago

To try and provide a definition from the synthesis of #8, #9, #11, if not a name:

A Hydra is a grouping of one or more content bitstreams, metadata about the grouping, and zero or more s included as parts or components.

How far off is that from your expectations? :)

jpstroop commented 9 years ago

You're right, everything is overused; To contradict myself, I'm less concerned about the baggage a name carries than I am about it reflecting what the thing actually is, at least the best we can.

The way I'm thinking of what we're calling Work (see #9), I'm not sure that a Work is a work, if part of a Work could also be a Work, or at least a subclass thereof.

Maybe DigitalObject isn't right either. LogicalUnit? LogicalObject?

jeremyf commented 9 years ago

@azaroth42 I believe that is providing the proper scope for given a name and a different name.

A Hydra is a grouping of one or more content bitstreams, metadata about the grouping, and zero or more s included as parts or components.

escowles commented 9 years ago

I think the issue with "Work" in particular is that I'm not sure what we're talking about lines up with a FRBR Work. AFAICT, the DigitalObject/Work/etc. we're talking about lines up more closely with FRBR Expressions or Manifestations.

azaroth42 commented 9 years ago

Item? But that has the opposite FRBR connotation.

jeremyf commented 9 years ago

@azaroth42 @escowles

Perhaps, while we are spitballing, we can it Hydra::Wortem for Work. Then we can easily use sed to replace Wortem with a better name.

mjgiarlo commented 9 years ago

IMO, the initial scope of Hydra::Works should be the sorts of objects that are deposited into institutional or data repositories, in which case I don't think FRBR is a relevant concern. I'm +1 on Works, but I also think it makes sense to come to agreement on scope before we get too hung up on naming.

Can we agree on this scope, and once we've made some progress on bringing together Sufia and Worthwhile (and Curate?) -- work which was very widely supported by the community, and which is timely for nearly all of our institutions -- then we consider expanding to some of these other domains?

jpstroop commented 9 years ago

I really don't think FRBR should ever be a concern outside of what you're sticking in your descriptive metadata, it seems to me like that would never be in scope for this conversation.

mjgiarlo commented 9 years ago

OK, just checking!

jpstroop commented 9 years ago

Just to bring my comments on #11 over to the correct thread (sorry for polluting your use case @escowles!). If the scope of this whole exercise is limited to IR data, that should be made clear somewhat soon. Princeton has that use case too (or similar, e.g. GIS, social science data, pdf-only journals)--I tend to talk about books and mss because it come up less frequently.

The appeal of what I saw in Worthwhile at HydraConnect was that we might finally be at a place where we wouldn't be 'going it alone' if we tried to build a systems that supports these two broad classes I've been talking about (i.e. digitized books and IR or IR-like resources).

Does that make sense?

jeremyf commented 9 years ago

@jpstroop I want to see a conversation about what these "Works" look like. Not at the detailed level (i.e. they have DC:TITLE) but that they have a common means of asking them "What are your predicates?"

In working on Hydramata::Works, I separated lots of concerns.

Once the Data Modeling was defined, I was able to create functions/actions that could transform the modeled data to the appropriate context.

jpstroop commented 9 years ago

@jeremyf I think I get what you're saying. I'm not concerned at all about what RDF predicates we use internally, only that things (streams) get grouped together logically (models) and that the relationships between those groups of things can be created and maintained in a way that also feels logical and idiomatic to both the objects themselves and AF/Rails (since that's our chosen framework).

Make sense? Do you feel like I'm bringing out my concerns in #23? Or am I missing something?

jeremyf commented 9 years ago

@jpstroop It makes sense. I merged your use case.

jpstroop commented 9 years ago

:+1:

mjgiarlo commented 9 years ago

Makes sense to me, @jpstroop!

atz commented 9 years ago

I am resigned to this community using "Works" as long as we document:

Overloaded alternatives: object, item, element, instance. DigitalObject and GenericThing are perhaps the worst of several worlds, imho. Manifestation would be appropriate, if non-intuitive. I might support Wortem for emphasizing the arbitrariness. Perhaps Molecule?

I don't look forward to having this conversation a dozen times this year as new eyes hit the codebase though.

mjgiarlo commented 9 years ago

RepositoryObject, ScholarlyWork, DepositedObject, IntellectualObject, ComplexObject ... yeah, they are all somewhat awful.

jpstroop commented 9 years ago

I think we can wait to label it for a while yet, but if we're talking about the model starting to shake out over in #11, I don't hate Work. I'm also tempted to take @jeremyf's suggestion of Wortem seriously.

jpstroop commented 9 years ago

If Component sticks around, append to @atz's bullets above: it ain't EAD.

escowles commented 9 years ago

OK, that's it. I propose the following nomenclature:

Collset -> Wortem -> Compart -> Bitfile

awead commented 9 years ago

@jeremyf Where do you get wortem, the singular dative form of die Worte?

If I can project into the future, we're going to be extending these, right? So you could call these what you like:

class Sammlung < HydraWorks::Collset; end
class Gesamtkunstwerk < HydraWorks::Wortem; end
jpstroop commented 9 years ago

@awead++

Could you work in Singspeil? That's in my use case.

jpstroop commented 9 years ago

re: @escowles's

Collset -> Wortem -> Compart -> Bitfile

I could almost legitimately get behind this or something like it. They're made up and therefore meaningless (and baggage-less), but still evoke their function.

escowles commented 9 years ago

I think every term in this space is so overloaded, the best approach is combinations of terms so at least people recognize that there are multiple interpretations and don't get misled by any other standard's particular interpretation. Plus, then we could call the module "portmanteau".

tomcramer commented 9 years ago

To add some perspective, I concur that developers are AN important audience for this term, but not the ONLY audience. Look at the confusion in "models" as a term--even among ruby developers--in the Hydra community. Are we talking abstract data models, ontologies in RDF, or Ruby code for an MVC? At one point, we banned the term "content model" in Hydra because it was so overloaded that we couldn't use it without causing confusion between Fedora ECM data / object modelers and Ruby developers.

I understand why Work is attractive (it doesn't already mean something to developers outside the library domain) and also why it's problematic (for metadata folk, it already has a very precise meaning which is contradictory to the proposed meaning in the context of Hydra-works.)

For the record, what is the problem with "digital object"? That it might represent an atomic file or bitstream that has no descriptive metadata / existence as a stand-alone, intellectual entity?

awead commented 9 years ago

@escowles's proposed digital object as an alternative to "work" which I balked at mostly because I was used to an object being a single unit, akin to what we're currently calling "compart." This is only because I was accustomed to the term's usage in the Fedora3 world. I think this speaks to @tomcramer's point that's it's pretty much impossible to have this conversation without using terms that have some kind of baggage associated with them. Hence, perhaps, the reason to use invented terms instead of existing ones.

On the other hand, maybe should just decide on terms with which we are the least uncomfortable and call it a day. After all, the terms should be irrelevant. It's the relationships that are the important part.

declanfleming commented 9 years ago

In our work at UCSD we've always called the intellectual item an "object", or sometimes "digital object" when we're talking with the rest of the library. An object:

has a permanent unique identifier has some basic metadata, can have lots of metadata, and some of that metadata can point to other objects can contain files contained files can have metadata

Deciding what an object encompasses is made on a collection by collection basis, usually driven by the anticipated need for referring to that object, either in class work, SEO, or exhibits. Collections of single pictures/audio files/newspapers/films are fairly easy - each object gets a unique identifier. Collections of research data, like a set of Computational Astrophysics Big Bang simulations, take a lot more consideration to determine where to draw the line between objects. We decided to do it at the individual simulation level:

http://library.ucsd.edu/dc/object/bb0103691x

Component identifiers are messy when you allow complex objects like this, and we don't have a great system for making them easy to use. Here's a gas density graphic from the simulation above:

http://library.ucsd.edu/dc/object/bb0103691x/_19_2.jpg

Ok, I'm blathering.... I like "object" or "digital object". I can see the benefits of @jeremyf and @escowles made up words Collset -> Wortem -> Compart -> Bitfile just to keep things moving. But I'd prefer "object".

(BTW, a collection is just another object, containing metadata, usually without files, and it has its own identifier.)

jeremyf commented 9 years ago

@awead Wortem came from Work and Item.

jpstroop commented 9 years ago

@declanfleming

(BTW, a collection is just another object, containing metadata, usually without files, and it has its own identifier.)

Yeah, following what's shaping up over at #11 (See @escowles diagram I think Collset, Wortem, and Compart could all have the same base class (probably Compart since it needs to be the most flexible) and then Worterm and Collset could just be refined, something like, roughly (leaving metadata streams out):

Compart:

Wortem (< Compart) :

Collset (< Compart):

I'm not sure you'd actually use subclasses in an implementation since the refinements on the types of Comparts that are allowed on Wortems might be a little awkward (how would these be enforced? Validations?), and the same goes for the Wortems in a Collset. More experienced Rubyists will have better ideas, but I'm guessing that an implementation with shared concerns/behaviors could be tidier.

scherztc commented 9 years ago

+1 on Declan's property description of an Object. That type of language is helpful for understanding some of the next coding steps. Could we do the same for Collset -> Wortem -> Compart -> Bitfile?

jpstroop commented 9 years ago

@scherztc the conversation is a little splintered at the moment, but if you check out the conversation in this issue (#8) and #11, things might come into focus a little bit. There are also a few use case docs in the use-cases directory of this repo.

dchandekstark commented 9 years ago

FWIW I find the made up terms more confusing that the overloaded real ones. Rhetorical question: Are folks who are putting forward the most complex, generalized modeling requirements actually intending to use Hydra::Works? Seems like maybe we're either going to have to take a poll, or nominate a small group to decide among various options. It's hard to see these questions being resolved in this forum.

escowles commented 9 years ago

@dchandekstark UCSD plans to use this model, or more accurately to use Sufia/Worthwhile/etc. after those have implemented this model.

IMHO, we've basically reached consensus on the entities involved and the appropriate relationships, metadata types, etc. So all we're waiting on is consensus on the naming (and any more use cases to trickle in). Given the unease with the portmanteaux, and the existing Generic prefix that I've heard some support for, how about:

jpstroop commented 9 years ago

@dchandekstark Yes, that is a rhetorical question! My answer is pretty much the same as @escowles: We want to use Hydra::Works/Worthwhile, and would hope to actively contribute to the project, but it really comes down to whether or not the shared models can handle the complexity of our stuff.

:+1: to Generic*, following @escowles diagram.