w3c / wpub-ann

Web Annotation Extensions for Web Publications
https://w3c.github.io/wpub-ann/
Other
6 stars 10 forks source link

Do we need scope, as well as Embedded Resource and Multiple Resource Selectors? #10

Closed iherman closed 6 years ago

iherman commented 6 years ago

The WA scope facility has been added; do we need when we also have the Embedded Resource Selector?

iherman commented 6 years ago

One argument may be simplicity... Using ERS may be complex. But there are indeed two ways to express similar things...

tcole3 commented 6 years ago

If scope covers all the use cases we had in mind when inventing Embedded Resource Selector, then I'd rather not add ERS - both because scope seems intuitively less complex and to stay closer to Web Annotations.

iherman commented 6 years ago

I am afraid that it is the other way round: ERS covers our used cases for scope, and more.

The example with scope in the document can be reproduced, using ERS, via:

{
    "source": "https://dauwhe.github.io/html-first/MobyDick.wpub"
    "selector": {
        "type": "EmbeddedResourceSelector",
        "value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
        "refinedBy: {
            "type": "CssSelector",
            "value": "#elemid > .elemclass + p"        
        }
    }
}

though it is of course more verbose than the original example:

{
    "scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
    "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
    "selector": {
        "type": "CssSelector",
        "value": "#elemid > .elemclass + p"
    }
}

However, what I have a problem with is the longer example for range+ERS: I do not see a way of reproducing it with scope alone.

iherman commented 6 years ago

I think the issue for keeping or not scope includes:

An additional minor "con" is the specification of scope. The current text is simply a verbatim copy of the WA definition:

The relationship between a Locator and the resource that provides the scope or context in this selection.

which is fairly general. We would have to define, in a normative sense, what it means for specifically Web Publications, which means we go (slightly) beyond what WA defined.

/cc @azaroth42

tcole3 commented 6 years ago

The example for range+ERS (really ERS+Refinement) seems to me better handled with the Multi Resource Selector since it is a selection that spans two resources. So, if we keep the Multi Resource Selector, then I'm not sure that ERS would be required for anything not handled by scope. Here's how I might do the range+ERS example as a Multi Resource Selector. There remains a question in my mind whether the *.wpub should be a scope rather than a source, but in WA source is required for all selectors and since an extension/refinement we already have is this draft is that the source is implicitly also a scope if no scope is explicitly specified, perhaps we're okay. (Alternatively, this might suggest that MRS is something different than a selector?) Aside: the selectors key may need to be renamed, since technically it is an array of SpecificResources rather than selectors - maybe resourceSelections?

{   "source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
    "selector": {
        "type": "MultiResourceSelector",
        "selectors": [{
            "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
            "selector": {
                "type": "TextQuoteSelector",
                "exact": "Call me Ishmael."
            }
        },{
            "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html",
            "selector": {
                "type": "TextQuoteSelector",
                "exact": "A hundred black faces turned around"
            }
        }]
    }    
}
iherman commented 6 years ago

However, this means that MRS must be defined in terms of range. (Which may mean that MRS is a misnomer, it should be something like MultiResourceRange or something...)

If this is the only use case we have, then you may be right.

iherman commented 6 years ago

@tcole3 I just realized, that we fell into the same trap, and that the definition of MRS is actually wrong.

The whole Selector Model is based on having a top object (the one we call now Locator) that identfies the source. The source attribute van appear only there and not in a Selector. Ie, the example above is incorrect (which means that MRS definition in the document is incorrect, too). We already had this discussion in an earlier round with @azaroth42. This is the reason we need the ERS, ie, to select within the top level source.

I have the impression that we need ERS and, possibly, MRS (though properly defined). The only issue that we may have is whether we need scope or not...

tcole3 commented 6 years ago

I would reverse your logic and define a Mulit-Resource Selector that handles all Resource Selection (Locator) use cases that involve more than one constituent Web Publication Resources within a Web Publication - this definition should say that the scope of the MRS is (for practical purposes) always the same as the MRS source and therefore is implicit. (We should not attempt to define a means to express a Locator that spans multiple Web Publications.)

Then I do not think we need an Embedded Resource Selector at all since the context for use cases involving only a single Web Publication Resource can be handled by scope and the single-resource selectors already defined by Web Anno.

In other words drop ERS and further flesh out the definition of MRS to make sure it meets use cases.

Or is there another use case requiring ERS that I'm forgetting?

iherman commented 6 years ago

@tcole3 I just realized, that we fell into the same trap, and that the definition of MRS is actually wrong.

Withdrawn. It is not wrong, I was misled by the selectors property name, which is a misnomer. It should be locators (just as @tcole3 said).

(Never make serious comments later in the evening. I am not an evening person, I am not an evening person, I am not an…)

iherman commented 6 years ago

@tcole3,

I have gone through three obvious examples/use cases to see how they would/could be encoded using scope, ERS, and MRS. I did not want to put the examples into this comment, it would have been way too long, but they are on a separate gist. My own remarks on these examples:

A comment on the range vs. MRS issue. If our primary usage for MRS is such special ranges, we should define MRS more tightly, modeled after the definition of the Range Selector (i.e, something saying that the range includes the portion of the selection in the first Locator, includes all intermediate ones in order, and finish by the range in the last Locator). If there are other use cases for MRS, we may need a sibling selector type that does not convey the notion of order in the resources.

I continue to feel a bit uneasy about the semantics of MRS, though. We are mixing a bit the roles of a Locator and a Selector, insofar as we rely on embedded Locator(s) within a Selector. It, sort of, works, but I have some sort of an uneasiness about it that I cannot properly express. @azaroth42 may help out here…


My (current) conclusions:

iherman commented 6 years ago

(sorry, pushed the wrong button and closed the issue; reopening...)

iherman commented 6 years ago

Looking at the latest example, provided by @RachelComerford, for MRS raises some further issues.

The example uses MRS in such a way that it does not mean some sort of a continuous selection (this is emphasized by @RachelComerford). But this is in contradiction with the definition of that selection which I have not yet changed in the corresponding PR (#20):

A Multi Resource Selection can be used to identify this span by creating an ordered list of Locators. The selection consists of everything from the beginning of the starting selector in the first Locator, all selections identified by the intermediate Locators in the list (if any), through to the beginning of the ending selector, but not including it.

It strikes me that there are two ways of looking at an MRS:

  1. representing a "set" of selections (whether they are part of the same resource or not)
  2. defining some sort of a range running over several resources.

We have a perfect example use case for the first option. But we have not covered the 2nd one.

I wonder if:

  1. We should modify the MultipleResourceSelector, removing that quoted definition. Actually, what we get is not really a MultipleResourceSelector but, rather, a MultipleSelector. (@RachelComerford's example would be perfectly valid if all sections were within the same resource)
  2. We should extend the Range selector by simply allowing intermediate selectors between the first and the last, with the restrictions that those selectors can only select full resources. That would mean we inherit the semantics of the Range selector without any further ado.

(This does not decide whether we need scope and ERS, but we would certainly need MRS or, say, MS.)

Cc: @azaroth42 @RachelComerford @tcole3 @BigBlueHat

RachelComerford commented 6 years ago

I have a very nontechnical understanding of this and I confess, I'm not entirely sure I'm tracking the conversation above so please tell me if these comments aren't relevant or if I'm missing something.

Some assumptions I'm making:

My understanding is that we need 4 types of selector (at least based on my experience and reading above):

  1. Identify the start and end of a selection of of content within a single "resource."
  2. Identify the start and and end of a selection that begins in one resource but ends in another and includes all of the content in between.
  3. Identify the start and end of a selection within a resource, the start and end of a selection within a second resource, the start and end of a selection within a third resource, etc. without including all of the content btwn those sections.
  4. Identify the start and end of a selection within a resource and the start and end of a selection within that same resource without including all of the content btwn those selections.

So, the definition would be something like: A multi resource selection identifies a collection of discrete (containing clearly defined starting and ending points) section(s) contained either within a single resource or across multiple resources.

iherman commented 6 years ago

@RachelComerford

Some assumptions I'm making:

  • Resource, based on the conversation above is that this is, for example, an xhtml file - is this correct?

XHTML/HTML/SVG, etc. So basically yes.

  • Selector covers a range - something with a beginning and an end
  • Versus locator, which covers a single location... a starting point with no defined end point

Not really. A selector is an abstract thing, which depends very much on the specific selector type. A CSS selection may actually select a number of elements in the HTML file (depending on the selector used), whereas a fragment selector typically selects one element only. The term locator is just a generic term that says: "this is the resource I use for a selection and here is the specific selector type I use".

My understanding is that we need 4 types of selector (at least based on my experience and reading above):

  1. Identify the start and end of a selection of of content within a single "resource."

That is what the RangeSelector does.

  1. Identify the start and and end of a selection that begins in one resource but ends in another and includes all of the content in between.

Yes. At the moment, ie, in the current draft, it is possible to do that using the RangeSelector and the EmbeddedResourceSelector, see example 14.

But it does it only partially: it defines a start and end, but it is unclear what is in between, so to say. Ie, if you select something in chapter 1 for the beginning and in chapter 3 for the end, does it mean that we select chapter 2 as a whole in between? Based on what? There has been quite some discussions on whether the default order makes sense or not. If not, then… Hence my proposal to extend the range selector to be able to explicitly list chapter 2 (in this example) as an 'intermediate' resource.

  1. Identify the start and end of a selection within a resource, the start and end of a selection within a second resource, the start and end of a selection within a third resource, etc. without including all of the content between those sections.

Yes. If we adopt a weaker version of the Multiple Resource Selector, which is used in your example for #20, this is something that can be done.

  1. Identify the start and end of a selection within a resource and the start and end of a selection within that same resource without including all of the content between those selections.

Indeed: if the Multiple Resource Selector becomes a Multiple Selector, ie, not necessarily based on several resources, then this becomes a special case of your (3) above.

So, the definition would be something like: A multi resource selection identifies a collection of discrete (containing clearly defined starting and ending points) section(s) contained either within a single resource or across multiple resources.

Heh. You claim you are not a technical person? Wrong!! :-) The definition is exactly what a Multiple Selector would be…

iherman commented 6 years ago

@tcole3 @BigBlueHat I was wondering a bit about the current design for the extensions, and I am not really sure we are heading in the right direction. Again, to avoid polluting the issue comments with long text, I put down my thoughts into a separate gist.

I am happy to amend the current, not-yet-merged PR #20 branch to reflect this if you guys agree with the direction.

BigBlueHat commented 6 years ago

@iherman added some thoughts https://gist.github.com/iherman/65254a9e914de0af319a6800936af39e#gistcomment-2243202

I'd also love @tilgovi and @treora to give the gist a quick skim if they can spare some cycles! 😃

iherman commented 6 years ago

Closing by virtue of merging #23. More specific issues have also been added for further discussion: #24, #25, and #26.

Treora commented 6 years ago

Sure, happy to share my unrefined thoughts (sorry if I have missed pieces of the discussion!):

As for MultiSelector: Seems useful, if we wish to enable selectors pointing at multiple things. But it seems such a basic form of composition that I would hope some existing primitive can be used. Using multiple values for a selector field would be the first option that comes to mind (in JSON this would be expressed with an array: { "source": "...", "selector": [ { ...selector1 }, { ...selector2 } ] }). However, the annotation spec already says that "Multiple selectors should select the same content", so that plan fails. Using the same primitive as for multi-target/body annotations would make a lot of sense. Perhaps we could follow appendix D of the WA spec, or replace appendix D from that spec with whatever you come up with (if it is general enough). If a generic 'combiner' approach is not desired, inventing a specific MultiSelector seems acceptable to me too.

As for EmbeddedResourceSelector, scope, etc: scope feels simply like the inverse relation of refinedBy, and I would try to use refinedBy instead whenever possible (perhaps always?). refinedBy is a great primitive. In fact, it is too late now, but we could perhaps even have dropped the whole distinction between SpecificResource and Selector, replacing a SpecificResource with a ResourceSelector: {"type": "ResourceSelector", "source": "...", "refinedBy": { ...some selector... }}. This ResourceSelector would then also be usable instead of the here proposed EmbeddedResourceSelector. Anyhow, it's too late for that, but the ability to select a resource embedded within a resource seems important.

As for extending RangeSelector, I don't see the necessity of adding a field intermediateSelectors, but maybe I have not understood the problem. I would expect that the resources embedded in the publication, between the resources of the start and end selector, would all be selected as a whole, no?

iherman commented 6 years ago

@Treora

As for MultiSelector: Seems useful, if we wish to enable selectors pointing at multiple things. But it seems such a basic form of composition that I would hope some existing primitive can be used. […] Perhaps we could follow appendix D of the WA spec, or replace appendix D from that spec with whatever you come up with (if it is general enough). If a generic 'combiner' approach is not desired, inventing a specific MultiSelector seems acceptable to me too.

Actually, this is now a separate issue (#26); the proposal discussed there is to indeed use an array of Locators (like Appendix D). And yes, that may be a viable alternative; my problems is (to quote my own comment):

That being said, it becomes also a convenience question. The Locator's usage/implementation may be simpler if everything is one locator; otherwise we are forced to define higher level notions on how locators are used (this is not a problem for WA, but we would have to do something extra here, if only by taking over more from WA). Ie, I prefer keeping the local structure, but, at the end of the day, end users as well as implementers may have to decide...

Bottom line: this is still an open issue:-) (if possible, we should continue the discussion there, just for admin reasons)

As for EmbeddedResourceSelector, scope, etc: scope feels simply like the inverse relation of refinedBy, and I would try to use refinedBy instead whenever possible (perhaps always?). refinedBy is a great primitive. In fact, it is too late now, but we could perhaps even have dropped the whole distinction between SpecificResource and Selector, replacing a SpecificResource with a ResourceSelector: {"type": "ResourceSelector", "source": "...", "refinedBy": { ...some selector... }}. This ResourceSelector would then also be usable instead of the here proposed EmbeddedResourceSelector. Anyhow, it's too late for that, but the ability to select a resource embedded within a resource seems important.

Yes, it is too late:-(

As for extending RangeSelector, I don't see the necessity of adding a field intermediateSelectors, but maybe I have not understood the problem. I would expect that the resources embedded in the publication, between the resources of the start and end selector, would all be selected as a whole, no?

You would expect this, wouldn't you? However, that presupposes that a Web Publication always has a default reading order. But there has been quote some discussions on whether that is true or not, hence the design that does not rely on any implicit order.

That being said: yes, this is an approach to be discussed. I added a separate issue for this (#28), let us track it there!

(a side issue: the latest draft uses selectors and not intermediateSelectors)