hybox / models

Data Modeling repository for HyBox (ontologies, vocabularies, best practices, requirements, etc)
Apache License 2.0
5 stars 3 forks source link

Discussion of Postcard Model #42

Open azaroth42 opened 8 years ago

azaroth42 commented 8 years ago

At: https://github.com/hybox/models/blob/master/notes/usecase.md

azaroth42 commented 8 years ago

(Tagging: @anarchivist @no-reply @escowles @cmh2166 @tpendragon @mjgiarlo)

tpendragon commented 8 years ago

Two concerns, I'll put them in separate comments for discussion:

_:pc1 a pcdmw:Work ;
  rdfs:label "Postcard" ;
  edm:isRepresentationOf _:rwopc1 ;
  pcdm:hasMember _:front1, _:back1 ;
  pcdm:hasRelatedObject _:tn2 .

Postcard 1 has an image representation of the whole (a thumbnail) which is FileSet referenced via hasRelated Object.

_:front1 a pcdm:Object ;
  rdfs:label "Front of Postcard" ;
  pcdm:hasMember _:frontfs1 .

The front of postcard 1 has a FileSet which is representative of the whole object as a FileSet linked via pcdm:hasMember.

These two patterns seem conflicting, and work against the flexibility to have a collection of all the front pages of postcards, for instance. Do we need to pick a pattern (always hasRelatedObject? Is hasRelatedObject's semantics right for "representative fileset"? Is there something that should be on that FileSet to identify it as a representation...? Do we need a new predicate?)

tpendragon commented 8 years ago

Second concern:

Why is the front of the postcard a pcdm:Object and not a pcdmw:Work? In order to support the use case of "a collection of all the fronts of postcards", for instance, each layer of the hierarchy would have to look the same. I think they both act the same (or, at least, SHOULD), and the goal here seems to be to identify it as a Part - but it's only a Part in the context of its parent. Outside that context it's a standalone object - I think?

azaroth42 commented 8 years ago

@tpendragon

Re related vs member, I agree. We're confusing representationOf and partOf ... leading to putting representations in hasRelatedObject when there are parts, and in hasMember when there aren't. If we had hasRepresentation separate from hasMember, we would solve the issue. And likely also Stefano and Adam W's concerns at the same time? Before pcdm-works this wasn't an issue, as representations would have gone directly in hasFile -- now there's an intervening Object subclass.

And re Object vs Work ... I was taking the approach that a Work is a bound or otherwise coherent thing. If the back of a postcard is a Work, then is the verso of page 214 of a book also a Work? I'm happy to drop Work completely and just use Object, or to come to a clearer definition of when to use each.

tpendragon commented 8 years ago

If we had hasRepresentation separate from hasMember, we would solve the issue.

I'm leaning this way too.

is the verso of page 214 of a book also a Work?

When CC uses the "Part" stuff, it will be in Plum at least. And I think that's right.

azaroth42 commented 8 years ago

If it's a Work, what's the difference between a Work and an Object? Can't we just use Object all the time?

tpendragon commented 8 years ago

If it's a Work, what's the difference between a Work and an Object? Can't we just use Object all the time?

Only if you let Works have Files.

azaroth42 commented 8 years ago

Have Files with hasFile, or have FileSets with (hasMember/hasRelatedObject/hasRepresentation)? But either way, Objects can have Files and FileSets, so we can still drop Work?

tpendragon commented 8 years ago

The reason there's a Work is because there are restrictions that don't exist on Objects (I think.) Of course, restrictions like that aren't present in RDF without something like SHACL, so we could always just say "if your Object hasFile something, and we're in Works land, we're not gonna do anything with it"

azaroth42 commented 8 years ago

The reason there's a Work is because there are restrictions that don't exist on Objects

Okay, so those restrictions should define the difference :) What are the restrictions?

tpendragon commented 8 years ago

What are the restrictions?

I'll need backup from @escowles, @jcoyne, @mjgiarlo, @cmh2166 and company, but the one I know of is "Works can't hasFile"

cmharlow commented 8 years ago

Re related vs member:

Thinking that we are talking about hasRepresentation here as what relates a PCDM object to the digital representation, not as a possible inverse to edm:isRepresentationOf (which relates an object to a RWO or another resource to the repository object).

Are we not intimating in some way that [Object hasMember Fileset hasFile File] is that hasRepresentation relationship, made more complicated to support multiple versions of that file/representation? So...

_:pc1 a pcdmw:Work ;
  rdfs:label "Postcard" ;
  edm:isRepresentationOf _:rwopc1 ;
  pcdm:hasMember _:front1, _:back1, _:tn2 .

_:tn2 a pcdmw:Fileset ;
  rdfs:label "Postcard Thumbnail" ;
  a ??:Thumbnail (would we want to add some kinda type vocab here?) ;
  pcdm:hasFile <thumbnail.jpg> ;
  pcdm:hasFile <thumbnail.tiff> .

Differentiating hasRepresentation from hasMember, +1 (so not to use hasRelatedObject for odd outliers), but in the meantime, the thumbnail doesn't seem, to me, to fall under that 'is not a component part' portion of the hasRelatedObject comment. But I'm open to either.

Works/Objects comment forthcoming.

Hope that helps, and at least, doesn't hinder.

cmharlow commented 8 years ago

Looking at the comments here: https://github.com/projecthydra-labs/hydra-works/blob/master/lib/hydra/works/models/concerns/work_behavior.rb

Is it saying that pcdmw:Work could not pcdm:hasMember pcdm:Object that isn't a pcdmw:Fileset?

escowles commented 8 years ago

Agree with @tpendragon that _:front1 and _:back1 should be Works instead of Objects (probably all the Objects in the example should be Works). When we said at LDCX that we were using Part for convenience, I understood that to mean it was short for "Work that is part of another Work in some context". I also agree that the limitation of not having Files is an important limitation of Works that doesn't apply to Objects.

Two other things worth calling out:

  1. _:pc1 pcdm:hasRelatedObject _:tn2 seems fine, but only if _:tn2 is a purpose-made thumbnail image (maybe a composite showing the front and back of the postcard in a single image?). But I would typically expect it to use one of the child objects as the preferred representation (see the mailing list discussion).
  2. Instead of:
_:back1   pcdm:hasMember _:backfs1, _:backfs2 .

_:backfs1 a pcdmw:FileSet ;
  rdfs:label "Back of Postcard Image" ;
  pcdm:hasFile </backfs1/files/back.jp2>, </backfs1/files/back.jpg> .

_:backfs2 a pcdms:FileSet ;
  rdfs:label "Back of Postcard Transcription" ;
  pcdm:hasFile </backfs2/files/tei.xml>, </backfs2/files/transcription.txt> .

why not combine the FileSets:

_:back1   pcdm:hasMember _:backfs1 .

_:backfs1 a pcdmw:FileSet ;
  rdfs:label "Back of Postcard" ;
  pcdm:hasFile </backfs1/files/back.jp2>, </backfs1/files/back.jpg>,
    </backfs2/files/tei.xml>, </backfs2/files/transcription.txt> .

If we want a scenario that demonstrates two FileSets under a Work, maybe this would be better:

_:back1   pcdm:hasMember _:backfs1, _:backfs2 .

_:backfs1 a pcdmw:FileSet ;
  rdfs:label "Back of Postcard Image" ;
  dc:date "1999" ;
  pcdm:hasFile </backfs1/files/back.jpg> .

_:backfs2 a pcdms:FileSet ;
  rdfs:label "Back of Postcard Image" ;
  dc:date "2014" ;
  pcdm:hasFile </backfs2/files/back.jp2>, </backfs2/files/back.jpg>,
    </backfs2/files/tei.xml>, </backfs2/files/transcription.txt> .
azaroth42 commented 8 years ago

probably all the Objects in the example should be Works

So there is no situation in which you would use pcdm:Object, only ever FileSet (for a bundle of files) and Work (for everything else)? If there's no other distinction between Object and Work, I don't see the point of having the subclass. Just have an application profile that says Don't use hasFile from Object if you're using FileSets.

Work: A work or intellectual entity, such as a book, film, dissertation, etc.

No one would ever use "Work" to describe a page of a book given this definition. A Chapter (e.g. a Range, subClassOf Object, not Work) is more of a Work than a physical page as it's the logical or textual structure that came from the intellectual/creative effort, not the printing of the physical Item.

I understood Work as "An Object that represents a coherent and complete intellectual entity which should be separately discoverable from other Works"

preferred representation

There isn't a predicate yet for the preferred representation though.

why not combine the FileSets:

Because .txt is derived from .tei, and FileSets have a single master and a single media type. The TEI is not derived from the Image.

escowles commented 8 years ago

Granted that the current description of Work is inadequate. But if you digitize an atlas, creating a Work for the atlas as a whole, and an Object for each page/map, what happens when you want to add one of the maps to a Collection of maps? Do you upgrade that Object to a Work at that point? Or do you create all the pages as Works?

cmharlow commented 8 years ago

That seems an oddly specific instance of Pages, one where they are Works as defined by what Rob points out (they're complete maps as well as pages). This is evidenced by 'page/map' part of your response, right? We're saying there the combination of carrier part and some kind of complete work.

In that same atlas example, you could have an atlas that has a single map that spans pages. I'd propose that the atlas is a work, the map is a work, and the pages are objects.

I didn't think about this before, but I agree with Rob's critique of everything that isn't a Collection or a Fileset becoming a Work seemingly by default, and we probably don't want that to happen (or if we do, it changes how we are defining Work, and the documentation should be updated). What was the original intention of Work in this context? (I'm sorry to say I honestly don't know as I'm a recent interloper) To just add some functional aspects to generic Objects vis-a-vis the HydraWorks/PCDMWorks gem and LDP specification? Or to create a PCDM extension ontology that allows for the concept of Works, and attach functionalities to those concepts?

While I would have said before that the postcard sides are Works (my original question that twitter thread), I'd agree with Rob's response to that in this thread - in his given example, they should just be Objects. They are "Parts" that have no complete Work aspect to them (as far as we know). But I'd leave the option open that someone dealing with particular kinds of postcards may run into an instance where a side could be a Work (if a side is a Map, say).

Hope I haven't misunderstood the question nor sidetracked the discussion.

azaroth42 commented 8 years ago

If the map has its own intellectual value separate from the atlas, then I would create it as a Work from the outset. However, I don't see why that would prevent it from being in a Collection if it was just an Object? I can have a Collection of pages that mention Paris, regardless of whether the page is a Work or an Object. If the depositor doesn't say that they're Works so they get created as Objects, and someone later wants to change that ... then I don't see a problem with that either?

In the separate resource for metadata model, I would associate edm:isRepresentationOf with Work rather than Object. If you want to have metadata beyond a basic label for a PCDM object, then it's a Work. If you're okay with it just being a constituent member, then it's an Object. In the postcard case if it was important to describe the artwork on the front of the postcard, or the writing on the back, then make it a Work with its own metadata. If it's just a "page" without meaningful differentiation for, then leave it as an object.

tpendragon commented 8 years ago

I really don't see why there needs to be a distinction. I'm pretty sure I prefer all works or all objects, because it makes the user experience easier to figure out.

If the question is, "should there be Works at all, rather than Objects with restrictions", I think that's valid, and I'm not sure what best practice is there. But using the two because they exist to draw lines where there doesn't need to be any seems too complex.

escowles commented 8 years ago

The atlas/map example is obviously contrived, but I think the basic point is that whether something is a work or a part is largely contextual. This is something we've talked about from the first meetings in Portland, all the way up to LDCX. I doubt whether we can know what the future context of our objects will be, and (selecting maps to go in the Collection could happen after the atlas was created in the repository), so I would rather create it as the type of resource that can stand alone, even if we don't expect that it will.

I would be fine with getting rid of the Work class and using Object for everything that's not a FileSet or a Collection. And having an application profile say that Files should be attached to FileSets only seems reasonable.

eocarragain commented 8 years ago

I think the "Postcard Thumbnail Image" FileSet should be :tn2fs1 not :fn2fs1, see:

_:tn2 a pcdm:Object ;
  rdfs:label "Postcard Thumbnail" ;
  pcdm:hasMember _:tn2fs1 .

_:fn2fs1 a pcdmw:FileSet ;
  rdfs:label "Postcard Thumbnail Image" ;
  pcdm:hasFile </fn2fs1/files/thumbnail.jpg> .
azaroth42 commented 8 years ago

I'm happy to just have Object and Fileset. Will update the document this afternoon.

azaroth42 commented 8 years ago

Changes:

Thoughts?

azaroth42 commented 8 years ago

Can more than one Object hasMember / hasFileSet any given FileSet? As it seems like a wrapper around hasFile for grouping, I would expect not?

Along with filesets not being ordered, it seems like a direct container would do the trick in LDP without the need for proxies.

elrayle commented 8 years ago

I have been following the conversation and I want to be clear about where we landed. Are you saying...

Cardinality Options

Object has...

Option 1: an Object can have one-and-only-one FileSet Option 2: an Object can have many FileSets (I think you are saying this.)

FileSets can be in...

Option 1: a FileSet can belong to one-and-only-one Object (I think you are saying this.) Option 2: a FileSet can belong to any number of Objects

Ordering

Sets of Objects (gathered by an Object or a Collection) can have one (or more) orders applied to the set. The first order has first and last directly in the gathering Object or Collection. Additional orders are applied by having an in between OrderObject for each additional order.

Sets of FileSets (gathered by an Object) cannot have order.

I am hesitant on two points (assuming I interpreted things correctly). Note since both Objects and Collections can have Objects as members, I will refer to them generically as aggregations.

Lynette

edited by Rob to make the github UI not collapse all the content down to nothing

azaroth42 commented 8 years ago

My opinion:

elrayle commented 8 years ago

A few more questions:

Lynette

From: Rob Sanderson notifications@github.com<mailto:notifications@github.com> Reply-To: hybox/models reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, April 19, 2016 at 12:20 PM To: hybox/models models@noreply.github.com<mailto:models@noreply.github.com> Cc: Lynette Rayle elr37@cornell.edu<mailto:elr37@cornell.edu> Subject: Re: [hybox/models] Discussion of Postcard Model (#42)

My opinion:

You are receiving this because you commented. Reply to this email directly or view it on GitHubhttps://github.com/hybox/models/issues/42#issuecomment-212002911

azaroth42 commented 8 years ago
  • Given that the only way for a FileSet to be shared with multiple aggregations is for it to be a member of an Object and then make that Object a member of another aggregation, what is the use case for multiple FileSets in an Object?

Each fileset has a single master file, with derivatives. You likely want to associate multiple master files, each with their own derivatives with a single Object. For example:

  • What is the path for a FileSet that is added as part of several FileSets in a single Object to be shared by itself with another aggregation?

You can't do that at the moment. Is there a real use case for this?

Which of these continue to hold?

  • [X] Collection can haveMember Collection // Yes
  • [X] Collection can haveMember Object // Yes
  • [ ] Collection can NOT haveMember FileSet // I would prefer a different predicate and Container for FileSets from hasMember, but Collections must be able to have FileSets somehow.
  • [X] Object can haveMember Object
  • [ ] Object can haveMember FileSet // Same as Collection
  • [ ] Object can haveFile File // I think the application profile should say that Objects SHOULD NOT have Files directly, even if there's only a single File, it should be in a FileSet.
  • [ ] Are there any restrictions that would limit Object from haveMember Objects and FileSets in the same Object? // As above, I would prefer different predicates to link to FileSets.
escowles commented 8 years ago

FileSets cannot have descriptive metadata of their own, beyond label, as they're just a convenience for grouping binaries together.

If FileSets are for grouping an original file together with its derivatives, then I think they also should have metadata about creation (creator, date, software, etc.). In the postcard image/transcription example, you might want to know who did the transcribing, or when. Maybe that's not descriptive metadata, but it's broader than I've been thinking of technical metadata.

azaroth42 commented 8 years ago

Good point! We should be clearer about the distinctions. I think there's at least file characterization (associated with the File), the provenance of the file and derivatives (associated with the File and/or FileSet?), and then the broader descriptive information about the Object.

I'll update the example with some provenance on the FileSet.

mjgiarlo commented 8 years ago

I've understood FileSets as a grouping of Files that allows assertion of descriptive metadata about said grouping (and/or, arguably, the original_file), including a label but possibly including much more.

Part of the problem of course is that the categories of metadata used by cultural heritage organizations are... not cut and dried, and feel distinctly pre-RDF to me. Put otherwise, the boundary between descMD and techMD (etc.) is permeable and fuzzy at best. If I run FITS and it returns an author (as extracted from embedded file metadata), is it techMD or descMD? Better question... should we care?

azaroth42 commented 8 years ago

If we don't care, and people put dcterms:creator on the File sometimes and on the FileSet other times, there seems to be a usage cost -- In order to find out if there's a creator, you need to check multiple locations. I think it's easier to associate that sort of information with the FileSet than with the File.

mjgiarlo commented 8 years ago

I'm :cool: with that.

elrayle commented 8 years ago

RE: Use case for sharing FileSet with other aggregations.

First an observation: In your examples, all the files in the FileSet are variants of the same 'object' (e.g. different resolution, scanned at different times, scanned using different techniques, etc.) In this case, it makes sense that if you want to share the 'object' with other aggregations that you would share all the FileSets which equates to sharing the Object holding all the FileSet variants of the 'object'.

Self-deposit IR use case: There is no enforced restriction preventing a user from gathering FileSets for different 'objects' in a single Object. For example, a user might put a presentation, poster, and paper presented at a conference in the same Object. The user might also have a collection for My Presentations. The only FileSet from the original Object that the user wants to include in the My Presentations collection is the presentation FileSet.

elrayle commented 8 years ago

Does this express the model being proposed? FileSets become a first class model in PCDM?

pcdm_model-with-filesets

escowles commented 8 years ago

@elrayle, I think the new model would be implemented by creating an pcdm:Object for each file a user uploads. The Object would have a single FileSet, which would contain the file, plus extracted text, thumbnail, etc.

elrayle commented 8 years ago

@escowles If I understand correctly, your only adjustment to the displayed model is pcdm:hasFileSet is 1:1 instead of m:m? I believe @azaroth42 earlier stated that there can be 1:m (one Object with many FileSet) so that the Object can hold uploaded variants (e.g. scans from different times/techniques, etc.) of the same 'object' each in its own FileSet and the FileSet holds generated derivatives of the 'object'.

escowles commented 8 years ago

@elrayle Yes, that's my understanding. The use cases for multiple FileSets attached to an Object are things like multiple digitizations, adding TEI transcriptions, etc. I wouldn't expect to have those very often in an IR, but maybe there are examples I'm not thinking of.

elrayle commented 8 years ago

Is this still the application profile for PCDM... https://github.com/duraspace/pcdm/wiki?

Are the changes described in this thread effecting PCDM proper or are they extensions?

mjgiarlo commented 8 years ago

@elrayle I suspect this will result in a new version of PCDM proper, refining the PCDM Works extension and pulling it into PCDM.

barmintor commented 8 years ago

I think there are use cases for aggregating FileSets in multiple aggregations, and definitely use cases for not wanting pull all the FileSets under an Object in. You're going to have digitization folks uploading a batch of stuff (Preservation Images of Postcard Archives Box 24; Infrared Scans of Postcard Archives) for whom the batch is the actual thing. You're going to need to be able to aggregate them under objects later (Bahamas Postcard) for context later. The relationship of those FileSets to the aggregating object is going to be contextual. So while I totally agree with needing multiple FileSets per object, and I totally agree with needing to have an Object that represents Parts, I think we're going down a questionable path as far as cardinality of aggregation and prescribed containment.

elrayle commented 8 years ago

Are you using postcards as an example use case for working through modeling issues? Or is hybox planning to support models for several well defined common use cases?

barmintor commented 8 years ago

I was borrowing the example context of postcards from Rob (not the literal example he used, just the pretend-concrete type of things), but:

  1. I'm under the impression that cultural heritage images are also a HyBox use case
  2. PCDM has a broad scope, as does hydra-works as the PCDM implementation of record for Hydra.
mjgiarlo commented 8 years ago

@elrayle Our top two priorities for content types now are multi-file works (modeling a flexible need for "traditional" repository deposits, if you will) and photographs. See this blog post for more on that.

Postcards are a slightly more complex use case than photographs, so we were satisfying our modeling needs for photographs and simultaneously checking to see if that model could be built upon for more complex models.

azaroth42 commented 8 years ago

Re Postcards ... they're a good stand in for a "real" CH image based object. They have text, artwork, multiple sides and thus order, and so forth. You can easily extrapolate forwards to many ordered sides from two, or backwards from two to just one. And as the UCSB postcard example shows, it can be much more complex than that ... on the same order of magnitude as an atlas of maps.

Regarding batch as the "thing" ... I disagree that it's something the repository should care about at a core object level. If the batch is important then create an identity for it, and link the fileset to it... but the use of the fileset is associated with the object, not the workflow in which the image was created. That said, the existence of the FileSet being dependent on the existence of the object that the FS is related to is a valid point. We shouldn't presume that the structure will exist before the content does in the repository.

barmintor commented 8 years ago

Right: I don't want to reify batchness as a separate type of object; but I want to recognize that a single CH repository has core audiences that reckon object contexts for FileSets differently, and that this is expressed in the workflow- hence a need for reaggregation. This in some ways looks back longingly at AdministrativeSet, I know...

elrayle commented 8 years ago

My concern is in building certain assumptions into the model. I don't think you can make an assumption for the general case that all FileSets that are in the same PCDM:Object are different representations of the same 'object'.

How will Sufia enforce such an assumption for self-deposit users who may put a presentation, poster, and paper all in the same Hydra:Work?

elrayle commented 8 years ago

And making the restriction that a FileSet can live in one-and-only-one PCDM:Object means that the user who added the presentation, poster, and paper in the same Hydra:Work cannot then put just the presentation in another Work they are using to hold all their presentations.

escowles commented 8 years ago

@elrayle , I think what's being proposed here is that Sufia would create a structure like this for a single uploaded file:

If you added another representation of the same presentation (e.g., an audio recording of it), you might add that to the same Part as a second FileSet:

I agree that supporting a files-first workflow means that FileSets should probably not be directly contained, and that opens up the possibility of them being members of multiple Objects.

If the user who uploaded the presentation and audio recording above wanted to add them to another work, I would expect the Sufia UI to let them select either the Part or the Work, but not the FileSets.

elrayle commented 8 years ago

@escowles Is Sufia going to automatically insert the Presentation Part (pcdm:Object) or is the user going to need to build this structure?

Here is my understanding of what users can currently do in Sufia...

After completion, there will be one Work with X members that are all FileSets (where X=number of files uploaded.)

Steps: (if automatic)

Steps: (if manual)

Questions: