Multiple Selectors - Githubissues

w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs

https://w3c.github.io/web-annotation/

Other

142 stars 30 forks source link

Multiple Selectors #93

Closed azaroth42 closed 8 years ago

azaroth42 commented 9 years ago

An outstanding issue for discussion has been the tradeoffs between clarity and simplicity for multiple selectors or making uber-selectors that try to encompass all the known information somehow. For example, in order to say even "Get the text from this XPath and then take the character range between these positions" currently requires an oa:List with two Selectors. For something that seems an important feature to enable, the concern was about the complexity of the current situation and whether there was an alternative.

Given #92, a new pattern is even more desirable.

Proposal:

Mint a new hasSubSelector predicate range and domain of Selector to chain multiple selectors together in order:

{
  "type": "SpecificResource",
  "selector": [
    {
      "type": "FragmentSelector",
      "value": "namedSection",
      "subSelector": {
        "type": "TextPositionSelector",
        "start": 5 // relative to target#namedSelection
        "end": 28
      }
    },
    {
      "type": "TextQuoteSelector",
      "exact": "quote in namedSelection"
    }
  ]
}

This has the advantages:

No additional List construction
Doesn't invert the ordering like having a SpecificResource as the source of a SpecificResource would. (e.g. above is do this, then do this... the SR version is to start at the bottom and work back up)
Multiple selectors would be interpreted legitimately as a Choice -- this specific resource has this selector and independently it has this other selector. So long as they're actually reasonable alternatives, the lack of ordering is inconsequential.

Note: There is an open question about States. If you have both an HttpRequestState and a TimeState, that would be interpreted as a choice of State not both. It's not clear (I believe) in any of the versions of Open Annotation how multiple States should be interpreted, and this should also be resolved. subState would enforce an ordering in the same way as above.

tilgovi commented 9 years ago

This would perhaps reduce the reusability of the selectors because now, rather than composing them with a container collection one must be related to the other.

I actually think that's fine, because I don't feel a strong need for selector reuse, but taking that further I would dispose of the SpecificResource/Selectors distinction entirely. hasSource is then just a property of one kind of common Selector that selects one Resource from all possible resources and hasSelector just becomes subSelector.

azaroth42 commented 9 years ago

It would mean that the instance of the top level selector could not be reused in other contexts, yes. The distinction between SpecificResource and Selector is essential, because there can be more than just Selectors. There's also State, Scope, Style and potentially third party specifiers.

iherman commented 8 years ago

Admin question: has this issue been discussed and closed (at least conceptually)?

azaroth42 commented 8 years ago

Discussed on 2016-01-20 call. Decision to add examples to get a good oversight of the space.

Use Case 1: Select the broad segment (index.html#namedSection) and then further refine it with a second selector (characters 5-28 of that section).

{
  "type": "SpecificResource",
  "selector": {
    "type": "FragmentSelector",
    "value": "namedSection",
    "subSelector": {
      "type": "TextPositionSelector",
      "start": 5 // relative to index.html#namedSection
      "end": 28     
    }
  }
}

Use Case 2: Two alternative selectors, one based on quotation, one based on offset.

{
  "type": "SpecificResource",
  "selector": [
    {
      "type": "TextPositionSelector",
      "start": 505 // relative to index.html
      "end": 528 
    },
    {
      "type": "TextQuoteSelector",
      "exact": "quote in namedSelection"
    }
  ]
}

Use Case 3: Mixing the two

{
  "type": "SpecificResource",
  "selector": [
    {
      "type": "FragmentSelector",
      "value": "namedSection",
      "subSelector": {
        "type": "TextPositionSelector",
        "start": 5 // relative to target#namedSelection
        "end": 28
      }
    },
    {
      "type": "TextQuoteSelector",
      "exact": "quote in namedSelection"
    }
  ]
}

Use Case 1b: Select within a container resource (e.g. a zip or epub or PWP) and then select the content.

{
  "type": "SpecificResource",
  "selector": {
    "type": "foo:MemberSelector",
    "value": "/resources/index.html",
    "subSelector": {
      "type": "TextPositionSelector",
      "start": 605 // relative to target.zip | /resources/index.html
      "end": 628     
    }
  }
}

Equivalent for UC1 currently:

{
  "type": "SpecificResource",
  "selector": {
    "type": "List",
    "members": [
      {
        "type": "FragmentSelector",
        "value": "namedSection"
      },
      {
        "type": "TextPositionSelector",
        "start": 5 // relative to index.html#namedSection
        "end": 28     
      }
    ]
  }
}

Equivalent for UC2 currently:

{
  "type": "SpecificResource",
  "selector": {
    "type": "Choice",
    "members": [
      {
        "type": "TextPositionSelector",
        "start": 505,
        "end": 528
      },
      {
        "type": "TextQuoteSelector",
        "exact": "quote in namedSelection"
      }
    ]
  }
}

And for UC3:

{
  "type": "SpecificResource",
  "selector": {
    "type": "Choice",
    "members": [
      {
        "type": "List":,
        "members": [
          {
            "type": "FragmentSelector",
            "value": "namedSection"
          },
          {
            "type": "TextPositionSelector",
            "start": 5 // relative to target#namedSelection
            "end": 28
          }
        ]
      },
      {
        "type": "TextQuoteSelector",
        "exact": "quote in namedSelection"
      }
    ]
  }
}

(Phew!)

iherman commented 8 years ago

I must admit that for me it is a no-brainer: the hierarchical approach is way more readable and clearer. Also less error prone when authored by a human (it is easy to mix up the "type" value in the other syntax...)

hugomanguinhas commented 8 years ago

Hi all, I also agree that the hierarchical approach seems much more readable...

I would just suggest to reuse the oa:hasSelector instead of creating the oa:hasSubSelector property by opening the domain for oa:hasSelector... this way the pattern could be applied in a recursive way.

Another option could be to chain oa:SpecificResources, each one with a oa:hasSelector, this way (note that it would need to be represented in reverse order):

{
  "@type": "SpecificResource",
  "selector": {
      "@type": "TextPositionSelector",
      "start": 5, // relative to index.html#namedSection
      "end": 28
  },
  "source": {
    "@type": "SpecificResource",
    "selector": {
      "@type": "FragmentSelector",
      "value": "namedSection",
    }
  }
}

with this option we would not need to change the model.

hope this helps!

iherman commented 8 years ago

On 21 Jan 2016, at 16:54, Hugo Manguinhas notifications@github.com wrote:

Hi all, I also agree that the hierarchical approach seems much more readable...

I would just suggest to reuse the oa:hasSelector instead of creating the oa:hasSubSelector property by opening the domain for oa:hasSelector... this way the pattern could be applied in a recursive way.

Can you say what you mean by 'opening the domain'?

Another option could be to chain oa:SpecificResources, each one with a oa:hasSelector, this way (note that it would need to be represented in reverse order):

{ "@type": "SpecificResource", "selector": { "@type": "TextPositionSelector", "start": 5, // relative to index.html#namedSection "end": 28 }, "source": { "@type": "SpecificResource", "selector": { "@type": "FragmentSelector", "value": "namedSection", } } }

This is cute, but I am afraid it would not gracefully scale. If we have a chain of, say, 5 selectors instead of two, the original proposed approach is still o.k. (well, it is a deeper nesting, but that is still fine), whereas by, essentially, making the intermediate sources explicit the way you do it here may become much more spaghetti looking...

with this option we would not need to change the model.

hope this helps!

jjett commented 8 years ago

Ivan, could you explain a bit more about why making the intermediate resources explicit becomes much more "spaghetti looking"?

If anything I'd say they provide a valuable access point for annotating intermediate granular states of the target. Which would allow for some graceful fallback if your annotation client cannot render all of an annotation, e.g., the annotation targets a specific spatial portion of a youtube video within a particular time interval but the client only supports segmenting by time.

[Side note: This isn't the first time that nesting doll structures have been suggested for selectors. But now folks see how intuitive they are.]

hugomanguinhas commented 8 years ago

Hi Ivan, with opening the domain, I meant to say to not restrict the domain for oa:hasSelector to resources of the type oa:SpecificResource, but also resources of type oa:Selector... so that it can be reused recursively.

about the alternative option, I don't see it much different and it is a pattern quite often used in functional languages... it is as "verbose" as the others with the advantage of not adding more constructs or patterns to the model.

...but the other one proposed, is also alright.

azaroth42 commented 8 years ago

Thanks @hugomanguinhas, @jjett! Could you do the three patterns with the nested SpecificResources, so we can compare?

iherman commented 8 years ago

On 21 Jan 2016, at 18:39, Rob Sanderson notifications@github.com wrote:

Thanks @hugomanguinhas https://github.com/hugomanguinhas, @jjett https://github.com/jjett! Could you do the three patterns with the nested SpecificResources, so we can compare?

…and not only with two selectors in the 'list' but, say, four. This may show better whether my fear of a "spaghetti" in the new scheme offered by @hugomanguinhas is justified…

Thx

iherman commented 8 years ago

@hugomanguinhas,

Hi Ivan, with opening the domain, I meant to say to not restrict the domain for oa:hasSelector to resources of the type oa:SpecificResource, but also resources of type oa:Selector... so that it can be reused recursively.

I understand. Putting an RDF hat on: if we want to formally define something a bit more general, while still wanting to make a tight definition in terms of RDF schemas or so, this may lead to the necessity to define the union of two classes as the domain for the property. Although my OWL knowledge has become a bit rusty by now, isn't it correct that this can be expressed correctly in OWL only? Do we want to go down the line of of using OWL for the specification of our vocabulary (afaik, this is not the case at this point).

I may be wrong.

about the alternative option, I don't see it much different and it is a pattern quite often used in functional languages... it is as "verbose" as the others with the advantage of not adding more constructs or patterns to the model.

...but the other one proposed, is also alright.

Actually… I am not really sure I understand the second option any more. Looking at your proposal:

{
  "@type": "SpecificResource",
  "selector": {
      "@type": "TextPositionSelector",
      "start": 5, // relative to index.html#namedSection
      "end": 28
  },
  "source": {
    "@type": "SpecificResource",
    "selector": {
      "@type": "FragmentSelector",
      "value": "namedSection",
    }
  }
}

isn't it correct that there is a missing term, namely

  "source": {
    "@type": "SpecificResource",
    "source" : "THE URL OF THE 'REAL' RESOURCE",
    "selector": {
      "@type": "FragmentSelector",
      "value": "namedSection",
    }
  }

ie, we have to put the reference to the 'starting' URL somewhere. Is that allowed? Doesn't this contradict to the domain specification of "source"? The range of "hasSelector" is "Selector", whereas the domain of "hasSource" is "SpecificResource"… Ie, we may have the same issue with the naming as we have with the original proposal.

jjett commented 8 years ago

@iherman, @azaroth42 Sorry to dodge @azaroth42 's request to repeat all three patterns but only the first pattern is actually pertinent. This is because the way that the UC3 example is written is that the choice happens at the top level. So either you have the simple case of the text quote selector or you have to follow the selector chain. Since UC1 is what we're actually debating the following examples are for UC1.

Generally speaking I think that nesting / chianing multiple selectors beyond two levels of depth is going to be highly unlikely (probably will occur > 1% of the time). So these four deep examples are extreme edge cases (they are honestly outliers).

Thinking about a situation that might call for such heroic efforts to dig, I looked at the HathiTrust Digital Library's infrastructure and supposed that if I wanted to annotate a particular piece of text on a particular page in a particular volume then I might have to do the following -- I would specify a particular volume using a query to the HTDL's DB (this isn't actually true -- we have persistent URIs for everything at this level). Next I would specify what page of the volume I wanted (necessary because each page is a distinct text--image is also available--file; we know that our scholars want more granular things than this and since we have the files in hand minting identifiers for them seems to be a good next step to facilitate the specification of even more granular things -- so this part is going away too). Afterwards I specify a portion of the page and finally apply the text position selector exactly as in the existing examples.

Rob's Proposed Solution:

{ "type": "SpecificResource", "source": "http://example.org" "selector": { "type": "foo:QuerySelector", "value": "knownItem", "subSelector": { "type": "foo:PageSelector", "value": "desiredPage", "subSelector": { "type": "FragmentSelector", "value": "namedSection", "subSelector": { "type": "TextPositionSelector", "start": 5 // relative to target#namedSelection "end": 28 } } } } }

Hugo's Alternative (which I should note was proposed in academic circles by Dubin, Jett & Senseney at the 2013 Balisage Conference) appears as follows.

{ "@type": "SpecificResource", "selector": { "@type": "TextPositionSelector", "start": 5, // relative to index.html#namedSection "end": 28 }, "source": { "@type": "SpecificResource", "selector": { "@type": "FragmentSelector", "value": "namedSection", }, "source": { "@type": "SpecificResource", "selector": { "@type": "foo:PageSelector", "value": "desiredPage", } "source" { "@type": "SpecificResource", "selector": { "@type": "foo:QuerySelector", "value": "knownItem", } "source": "http://example.org" } } }
}

Now, while the structure that Hugo (and others) has proposed looks more complex at first glance, the fact is that it is merely the inversion of the first. By making explicit use of the oa:hasSource property it has the added value that if at any time one of the selectors cannot be resolved for some reason or another one of the intermediary sources can still be rendered to the end user. Whereas the structure Rob has proposed provides only all or nothing functionality. Either I successfully apply all four selectors or I resort to the base source. It has no options for graceful failure.

@iherman I should note that this isn't spaghetti code. It doesn't even approach the definition for "spaghetti" and so I wish you wouldn't use such a pejorative term. It makes you sound dismissive rather engaged in the conversation (which I know isn't the case at all). Now you might argue that it's less intuitive but that's true for most inverted structures. Functionally this behaves precisely the same way as Rob's proposed pattern. Other than the option for providing graceful failure points through intermediate resources I don't see any difference between the two proposals. They're both better than what we have and I believe the crux of the debate should probably revolve around what is going to best for the end user -- "all or nothing" or "we got you half way there".

+1 for both solutions (and we might consider employing both patterns since they are as similar as property/reciprocal property) with some ground rules for providing info on which has been employed (unfortunately json doesn't have a way to include remarks; pretty extreme minus in my book but I digress and don't want to start a documentation soap box -- apologies).

And apologies all for the tl:dr.

Regards,

Jacob

iherman commented 8 years ago

@jjett: I stand corrected on the 'spaghetti' term:-) It was not my intention to be dismissive.

Also, thanks for the examples, because I see it more clearly now and I did misunderstand something in the original example of @hugomanguinhas, insofar as the second structure also has some sort of a nesting behaviour. (My, wrong, understanding was that... never mind. Not important.) Which also means that my concern about the range/domain of the properties is also moot.

Actually... the fair comparison of the complexities is if we drop the @type whenever it can be deduced (or add it everywhere, I do not want to get into this discussion again), in which case the second ("inverted") example becomes even slightly less complex for reading:

{
    "@type": "SpecificResource",
    "selector": {
        "@type": "TextPositionSelector",
        "start": 5, 
        "end": 28
    },
    "source": {
        "selector": {
            "@type": "FragmentSelector",
            "value": "namedSection"
        },
        "source": {
            "selector": {
                "@type": "foo:PageSelector",
                "value": "desiredPage"
            },
            "source" : {
                "selector": {
                    "@type": "foo:QuerySelector",
                    "value": "knownItem"
                },
                "source": "http://example.org"
            }
        }
    }
}

So... I am sold. Sold in the sense that the both patterns are fine and they are on a comparable level of complexity indeed. I do not think we may want to use both patterns, though; but if my understanding is correct the second pattern works out of the box right now, which is a major plus; reducing the number of necessary predicates is a good thing...

(Again, apologies for the spaghetti:-)

jjett commented 8 years ago

@iherman No worries regarding "spaghetti". I didn't think you were being dismissive but IIRC these discussions are public (and/or the archives are public, I forget which it is) so I wanted to clarify for the "audience." On the whole we should all probably watch out for pejorative terms (and I'm certainly as guilty of using them as anyone).

Regarding "@type". For the examples I merely repeated Hugo's verbage. IMO, it's probably very safe to replace "@type" with "type" as Rob has done in the proposed example. (IIRC we have already had this debate elsewhere and I thought that it was settled that we were making the substitution, thus the use of "type" in the proposed solution.) Perhaps @hugomanguinhas could say if my understanding also matches his regarding "@type"/"type".

iherman commented 8 years ago

@iherman No worries regarding "spaghetti". I didn't think you were being dismissive but IIRC these discussions are public (and/or the archives are public, I forget which it is)

both:-) so I wanted to clarify for the "audience." On the whole we should all probably watch out for pejorative terms (and I'm certainly as guilty of using them as anyone).

Regarding "@type". For the examples I merely repeated Hugo's verbage. IMO, it's probably very safe to replace "@type" with "type" as Rob has done in the proposed example. (IIRC we have already had this debate elsewhere and I thought that it was settled that we were making the substitution, thus the use of "type" in the proposed solution.) Perhaps @hugomanguinhas could say if my understanding also matches his regarding "@type"/"type".

In fact, my issue was not whether we use type or @type (I am not sure any more which approach we have adopted); I questioned whether the type information is necessary at all. But I don't want to reopen this discussion. However, a fair comparison of the two extracts requires either to use them everywhere in both examples, or none of them in both...

— Reply to this email directly or view it on GitHub.

tilgovi commented 8 years ago

The inversion is conceptually beautiful because, as others have observed, it re-uses hasSource.

The only issue I see with it is that, at least for a JSON-LD serialization, it may be tricky to represent more complex decisions involving choice.

When the nesting is from least to most granular, there can be choice decisions made at each level. When the nesting is from most to least granular, the structure is a graph that converges toward the root rather than a tree that diverges from it.

jjett commented 8 years ago

@tilgovi Conceptually these are the same trees. Not being a JSON developer though I have no idea if or how the logics needed to parse the trees vary overly much. Choice is orthogonal to level since it can exist at every level. If you can traverse the reverse tree then you should be able to also choose as "choice" lives at the levels of the nodes. I'll work up some "complex" examples first thing next week. (Up against some conference paper deadlines right now.)

jjett commented 8 years ago

So here is a "bad scenario" of multiple choices interweaved through the proposed pattern.

{ "type": "SpecificResource", "source": "http://example.org" "selector": [ { "type": "foo:QuerySelector", "value": "knownItem", "subSelector": [ { "type": "foo:PageSelector", "value": "desiredPage", "subSelector": [ { "type": "FragmentSelector", "value": "namedSection", "subSelector": [ { "type": "TextPositionSelector", "start": 5 "end": 28 }, { "type": "TextPositionSelectorAlt", "start": 45 "end": 68 } ]
}, { "type": "TextQuoteSelector", "exact": "quote in namedSelection" } ] }, { "type": "foo:PageSelectorAlt", "value": "desiredPage", "subSelector": { "type": "FragmentSelector", "value": "namedSection", "subSelector": { "type": "TextPositionSelector", "start": 5 // relative to target#namedSelection "end": 28 } } } ] }, { "type": "foo:QuerySelector", "value": "knownItemAlt", "subSelector": { "type": "foo:PageSelector", "value": "desiredPage", "subSelector": { "type": "FragmentSelector", "value": "namedSection", "subSelector": { "type": "TextPositionSelector", "start": 5 // relative to target#namedSelection "end": 28 } } } } ] }

And for the inverted structure.

{ "@type": "SpecificResource", "selector": [ { [ { "@type": "TextPositionSelector", "start": 5, "end": 28 }, { "@type": "TextPositionSelectorAlt" "start": 45, "end": 68 }, "source": { "selector": { "@type": "FragmentSelector", "value": "namedSection" } ] }, { "type": "TextQuoteSelector", "exact": "quote in namedSelection"
}, ],
"source": { "selector": [ { "@type": "foo:PageSelector", "value": "desiredPage" }, { "@type": "foo:PageSelectorAlt", "value": "desiredPage"
} ], "source" : { "selector": [ { "@type": "foo:QuerySelector", "value": "knownItem" }, { "@type": "foo:QuerySelectorAlt", "value": "knownItem" } ], "source": "http://example.org" } } } }

Again, I think these kind of complex chains of choices are likely edgecases. A more likely pattern is one choice between a single selector and a chain of two selectors.

iherman commented 8 years ago

@jjett

Thanks for this. Yes, it is very much of an edge case, but it is, nevertheless, useful (imho) to see it to make a better decision.

Forgive me, however, but I had real difficulties to grasp the structure of the extracts; I am a visual type, so I need an formatted code. So I did re-format the codes, and I reproduce the "direct" (let us call this way) and the "inverse" examples; please check whether I made a mistake. Actually, I think there were some issues in formatting with the "inverse", as far as I remember not there was an extra bracket somewhere...

Here how I summarize for myself the pro-s and cons.

(At least in this example) the "inverse" structure is definitely more DRY and also more compact (there goes my spaghetti!:-). Which is, in general, considered to be a good thing.
Maybe it is a price to pay for DRY-ness, but I can imagine that, in practice, defining an "inverse" structure for a given task is less intuitive. Indeed, in the case of a "direct" approach everything that is related to selection is part of a (possibly giant) selector, meaning that there is a strict separation between what is being selected and how it is selected. The "inverse" seems to spread the various selections either to an explicit "selector" and or the "source", which may require more thinking (more work...) for the human as well as the programmatic user to get it right.
- I must admit that, at first, I tried to reproduce the "direct" example in "inverse" myself, without looking at your code, and I failed/gave up
- (I did not try to go all possible variations to be sure that the "inverse" part does the same as the "direct" one, I just trust you on that.)
(As we said before) the "inverse" require less (if any) change on the current model. The "direct" approach requires a new predicate (or some sort of an OWL-like trick on the vocabulary to be able to reuse "selector").
The "inverse" approach "feels" more elegant in some sense...:-)

Did I forget any pros and/or cons?

As we all said before, both work and, actually, I have the impression that everyone of us could live either way. Personally, at this moment, my (mild) preference still goes to the "direct" approach that I continue to find more intuitive.

Ivan

P.S. Here are the formatted examples:

The "direct" structure:

{
    "type": "SpecificResource",
    "source": "http://example.org",
    "selector": [{
        "type": "foo:QuerySelectorAlt",
        "value": "knownItem",
        "subSelector": [{
            "type": "foo:PageSelector",
            "value": "desiredPage",
            "subSelector": [{
                "type": "FragmentSelector",
                "value": "namedSection",
                "subSelector": [{
                    "type": "TextPositionSelector",
                    "start": 5, 
                    "end": 28
                },{
                    "type": "TextPositionSelectorAlt",
                    "start": 45, 
                    "end": 68
                }]
            },{
                "type": "TextQuoteSelector",
                "exact": "quote in namedSelection"
            }]
        },{       
            "type": "foo:PageSelectorAlt",
            "value": "desiredPage",
            "subSelector": {
                "type": "FragmentSelector",
                "value": "namedSection",
                "subSelector": {
                    "type": "TextPositionSelector",
                    "start": 5, 
                    "end": 28
                }
            }
        }]
    },{
        "type": "foo:QuerySelector",
        "value": "knownItemAlt",
        "subSelector": {
            "type": "foo:PageSelector",
            "value": "desiredPage",
            "subSelector": {
                "type": "FragmentSelector",
                "value": "namedSection",
                "subSelector": {
                    "type": "TextPositionSelector",
                    "start": 5,
                    "end": 28
                }  
            }
        }
    }]
}

The "inverse" structure:

{
    "type": "SpecificResource",
    "selector" : [{
        "type": "SpecificResource",
        "selector" :[{
            "type": "TextPositionSelector",
            "start": 5,
            "end": 28
        },{
            "type": "TextPositionSelectorAlt",
            "start": 45,
            "end": 68
        }],
        "source": {
            "selector": {
                "type": "FragmentSelector",
                "value": "namedSection"
            }
        }
    },{
        "type": "TextQuoteSelector",
        "exact": "quote in namedSelection"
    }],
    "source": {
        "selector": [{
            "type": "foo:PageSelector",
            "value": "desiredPage"
        },{
            "type": "foo:PageSelectorAlt",
            "value": "desiredPage"
        }],
        "source" : {
            "selector": [{
                "type": "foo:QuerySelector",
                "value": "knownItem"
            },{
                "type": "foo:QuerySelectorAlt",
                "value": "knownItem"
            }],
            "source": "http://example.org"
        }
    }
}

hugomanguinhas commented 8 years ago

Hi all,

@iherman, about the domain for oa:hasSelector, I was not suggesting to define it using a formal language, just make a note in the spec.... btw, is there such formal definition in RDFS or perhaps OWL? but, just a note that the domain could just be open without the need to prescribe either oa:SpecificResource or oa:Selector as rdfs:domain.

for the discussions on the stating the @type explicitly or implicit (entailed by one of the properties), my concern is for data consumers that are not applying RDF technology and thus may be expecting the @type to help determine how they will interpret and process the remaining structure... the @type may also play a important role for data consistency/validation as different clients/implementations may apply different modelling patterns but also may have miss-interpreted the spec and used it in a way that it is not expected... I would thus vote to keep it as much as possible even though it may become slightly more verbose.

Finally, I would add as another cons for the solution I proposed, that it would require twice the nesting (because of the additional SpecicResource in between) comparing with the simple nesting of selectors.

iherman commented 8 years ago

Hey @hugomanguinhas,

@iherman, about the domain for oa:hasSelector, I was not suggesting to define it using a formal language, just make a note in the spec….

I understand, but I am not sure it is really satisfactory if we keep to a formal language like RDF and with a formal vocabulary thereof...

btw, is there such formal definition in RDFS or perhaps OWL?

Not for RDFS afaik. In OWL, yes there is. It is possible to define the union of classes. Taking the example from the OWL2 Primer[1], one can define:

:Parent owl:equivalentClass [
   rdf:type     owl:Class ;
   owl:unionOf  ( :Mother :Father )
 ] .

meaning that if there is definition that says ex:prop rdfs:domain :Parent, and there is a triple x ex:prop y then the system can deduce that either x rdf:type :Mother or x rdf:type :Father is a valid triple. The caveat is that this class cannot be with the most "RDF-y" OWL profile, namely OWL-RL[2], ie, doing anything with it requires a more complex OWL reasoning...

However, all this can be hidden in the formal definition of the vocabulary that most of the users would not really use. But it can be there if needed. Ie, if we go down that line, we may decide to

Add the note as you say
Add the the formal definition in the namespace document for the RDF vocabulary (which I believe we will have to have)

[1] https://www.w3.org/TR/owl2-primer/#Complex_Classes [2] https://www.w3.org/TR/owl2-profiles/#OWL_2_RL

but, just a note that the domain could just be open without the need to prescribe either oa:SpecificResource or oa:Selector as rdfs:domain. for the discussions on the stating the @type explicitly or implicit (entailed by one of the properties), my concern is for data consumers that are not applying RDF technology and thus may be expecting the @type to help determine how they will interpret and process the remaining structure…

Yes, I understand that. However, the approach taken by RDF can be described in a general manner, too, without using the 'R' word: if, say, the "A" property is used, then the object, resp. the subject of that property is of a specific type. This is all that is needed, no need for a complex RDF technology. However...

the @type may also play a important role for data consistency/validation as different clients/implementations may apply different modelling patterns but also may have miss-interpreted the spec and used it in a way that it is not expected... I would thus vote to keep it as much as possible even though it may become slightly more verbose.

...I think that, in another issue, we closed that argument by accepting the SHOULD as a compromise between MUST and MAY. I guess we can leave it at that for now. Implementers/users in the CR phase will tell us if that is fine.

Finally, I would add as another cons for the solution I proposed, that it would require twice the nesting (because of the additional SpecicResource in between) comparing with the simple nesting of selectors.

Good point.

Cheers

iherman commented 8 years ago

I just realized that this issue has a strong connection to issue #110 ('Make Selectors available for the wide world?'). I do not want to repeat the arguments there; the short version is that I believe we should leave the door open to (1) make the selectors useful for usage patterns that are not necessarily related to annotations and (2) it may be useful/important to define new fragment identifiers expressing selectors. We agreed that this cannot be done, as recommendations, in the WG, but we should not make such an evolution unnecessarily difficult if we can avoid.

Looking at the "direct" vs. the "inverse" approach it is fairly clear to me that, mainly in view of (2) above, this shifts the balance strongly towards the "direct" approach. Indeed, one of the 'cons' for the "inverse" is that it mixes, in some sense, the original source's URL with the selection mechanism, whereas the "direct" approach doesn't. This means that translating the "direct" approach into a fragment identifier is doable (with the non-fragment part of the URL referring to the source) whereas the "inverse" approach becomes much less obvious.

As far as I am concerned, this tips the balance for me. My vote goes firmly towards the "direct" approach.

jjett commented 8 years ago

@iherman I'm not sure I'm following. The intersection with issue #92 is much more visible to me. Selector type and stacking/choice of selectors seem orthogonal to me. Is it possible for you to write up some examples that illustrate the problem?

iherman commented 8 years ago

Yes, of course, there is connection to #92. My comment was independent of that one.

What I expressed in my original issue is that I would like to be able to reuse the selectors as fragments, like:

http://www.ex.org/ex.html#selector(type=TextQuoteSelector,exact="anotation",prefix="this is an",suffix="that has some")

So, if I take the very first, simplified example of @azaroth, namely:

{
  "selector":
    {
      "type": "FragmentSelector",
      "value": "namedSection",
      "subSelector": {
        "type": "TextPositionSelector",
        "start": 5 // relative to target#namedSelection
        "end": 28
      }
    }
  "source": "http://www.ex.org/ex.html"
}

This can be translated into something like

http://www.ex.org/ex.html#selector(type=FragmentSelector,value=namedSection,subselector(type=TextPositionSelector,start=5,end=28))

Ain't pretty for human, but still useful; I have one single URL that expresses the full selection. Eg, in RDF (or elsewhere) I have a single URL for the selection. And the nice is is that it is a fairly straightforward definition of the selector, the "mapping" is clean.

However, if I take the inverse version of the selector, that looks something like

{
  "selector": {
      "type": "TextPositionSelector",
      "start": 5, 
      "end": 28
  },
  "source": {
    "selector": {
      "type": "FragmentSelector",
      "value": "namedSection",
    }
    "source": "http://www.ex.org/ex.html"
  }
}

Translating this into a fragment id essentially would force me to reproduce the "direct" version; I cannot translate this mechanically into a fragment id, because the URL for the resource http://www.ex.org/ex.html buried into the structure instead of neatly separated as in the "direct" case.

jjett commented 8 years ago

@iherman I see. This is an interesting RESTful URI. However, how does it accommodate choice?

My fear is that the need to accommodate choice makes the composition of a fragment selector like the one in your example, impossible. If that intuition is correct then the "direct" pattern still doesn't have any benefits over the inverse pattern which is still a brittle all or nothing approach for applying multiple selectors.

I'm also wondering how the browser is going to be expected to resolve the URI in your fragment selector example. Does it execute the compounded selectors from outside in or from the deepest nested parantheses on out (like we would do in mathematics)?

iherman commented 8 years ago

@jjett,

For choice: I do not really know yet, to be very honest; I have not really given too much thought to it. One could imagine do do something like#selector(...)selector(...) or #selector(selector(...),selector(...). I think it can be done, though.

For the resolution: I think it is clearly from outside, which is the way the "direct" approach also works. If you look at it, my example is simple a mechanical copy of the selector-subselector pattern, and that seems to work.

However. We should not solve all this now. The only thing I am saying and which is of importance right now is that the "direct" approach seems to work much better in being mapped onto a fragment id than the "inverse". This, for me, is a major "pro" argument for the "direct" approach, added to the pros and cons I tried to list in my previous comment

azaroth42 commented 8 years ago

I think it does tie to #110. It's the distinction between treating all of section 4 as a separate thing (Specific Resources, plus all the bits associated), or treating only section 4.2 (Selectors) as a separate thing. In the inverse structure, there needs to be the notion of the SpecificResource for it to work at all. In the direct structure (e.g. with subSelector) then you don't need it, you just need selectors.

Regardless of whether they can be treated as URI fragments, I think the inverse structure prevents us from exposing just the selectors, so it is also a "pro" for me. I'm also nervous about creating arbitrary complexity by putting States, Styles, Scopes, Roles and whatever else might come up, at different depths in the tree. For example, with some abbreviation to try and keep it readable:

{
  "selector": { "start": 0, "end": 10},
  "state": {"value": "Accept: application/pdf"},
  "source": {
    "selector": {"value": "namedSection"},
    "styleClass": "red",
    "source": "http://example.org/index.html"
  }
}

That to me says: Take index.html, and find namedSection. Then highlight that entire block with 'red', and then select the first 10 characters, and ask for them as a PDF. Other than "don't do that then!", having conformance for what we would expect clients to do in arbitrary situations like that seems like a nightmare we do not want to get ourselves in to.

azaroth42 commented 8 years ago

To try and characterize the discussion today, and please correct me if you disagree:

There was general agreement that the proposal is better than the status quo of using explicit Choice and List nodes
There was general agreement that the proposal is better than the "inverse" proposal, where SpecificResources are used as the source of other SpecificResources.
There was a new proposal that any structure is unnecessary

In order to keep up forward momentum, I propose for the next call that we accept this particular issue and #135 as an improvement on the status quo. We can then in a separate issue discuss the new proposal to remove the functionality. As the Multiplicity section needs reworking anyway given the discussion around lists, I'm happy to write up the proposal (once resolved) and then take it out again if that's the resolution of the new issue.

Does that work for everyone?

iherman commented 8 years ago

Works for me.

On 19 Feb 2016, at 18:38, Rob Sanderson notifications@github.com wrote:

To try and characterize the discussion today, and please correct me if you disagree:

There was general agreement that the proposal is better than the status quo of using explicit Choice and List nodes There was general agreement that the proposal is better than the "inverse" proposal, where SpecificResources are used as the source of other SpecificResources. There was a new proposal that any structure is unnecessary In order to keep up forward momentum, I propose for the next call that we accept this particular issue and #135 https://github.com/w3c/web-annotation/issues/135 as an improvement on the status quo. We can then in a separate issue discuss the new proposal to remove the functionality. As the Multiplicity section needs reworking anyway given the discussion around lists, I'm happy to write up the proposal (once resolved) and then take it out again if that's the resolution of the new issue.

Does that work for everyone?

—

davis-salisbury commented 8 years ago

Works for me too, thanks for summarizing.

azaroth42 commented 8 years ago

Hearing no objections ... :)

iherman commented 8 years ago

Discussed on telco 2016-02-26

Accepted proposal as in https://github.com/w3c/web-annotation/issues/93#issuecomment-186321067 with a change of term name: oa:refinedBy (refine for JSON)

See: http://www.w3.org/2016/02/26-annotation-irc#T17-01-16

azaroth42 commented 8 years ago

Done: http://w3c.github.io/web-annotation/model/wd2/#refinement-of-selection