w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs
https://w3c.github.io/web-annotation/
Other
141 stars 30 forks source link

Make Selectors available for the wide world? #110

Closed iherman closed 8 years ago

iherman commented 8 years ago

Selectors (and its subclasses) are very powerful. I can envisage other applications needing something similar. Actually, similar in two different ways: either as they are, as a description to select part of a resource; or to have a powerful fragment ID that can express the Selector concepts.

Can we make steps for reuse of these specifications without disturbing too much our work? Here are the alternatives I see.

  1. Separate the Selectors' part into a separate namespace. The oa namespace would be used for what is really annotation specific, and we could have the "select" (or whatever) namespace for the Selector class and all its subclasses. This change would affect only the RDF document, obviously, as well as the JSON-LD @context file. It would be invisible for the pure JSON-LD usage and document.
  2. Separate the Selectors' section (essentially 4.2 in the current model document) into a document on its own. I believe that this is only really necessary for the separate JSON document, the RDF document can stay as it is if we also adopt (1) above (RDF people are used to using part different vocabularies or part of an existing vocabulary).
  3. Define a fragment ID that reflects the current selectors.

I think that (1) and (2) are just minor editorial changes that we can do easily. The only downside of (1) is that it may be a strong departure from the Community Group document; I am not sure whether it is indeed a big issue.

What I meant for (3) is to define something like:

http://www.ex.org/ex.html#selector(type=TextQuoteSelector,exact="anotation",prefix="this is an",suffix="that has some") 
http://www.ex.org/ex.dt#selector(type=DataPositionSelector,start="4096",end="4104") 

etc. It may be relatively easy to mechanically define these things based on the document we would produce anyway. However, it does raise issues related to the standard definition of fragment ID-s. However, even if we do not do it in this Working Group, by doing (1) and (2) we would facilitate other groups (Community or Working Groups) to pick this up.

iherman commented 8 years ago

Re (3) above. I have contacted Liam Quin and Carine Bournez (both in the XML Activity) to find out about the status and the usability of the XPointer Framework and pointer schemes (see [1],[2],[3] below). Here is the summary of their answers:

Nevertheless, this is not a very pretty picture… in other words, there is no really suitable story to add additional fragment id-s to HTML5. This may have to be taken up with the TAG or the Web Platform WG.

  1. http://www.w3.org/TR/xptr-framework/
  2. http://www.w3.org/2005/04/xpointer-schemes/
  3. http://www.w3.org/TR/xptr-xpointer/
tilgovi commented 8 years ago

:+1: to items (1) and (2)

I am not as interested in a fragment syntax. I have often explained the purpose of the selectors as a mechanism to escape the need to define a fragment syntax, a projection to a single string, for arbitrarily complex selections.

Whether some subset of possible selectors should have a fragment syntax, for what media types, how that would come to pass, whether vendors would implement it.... all very difficult to say.

I agree that the first two steps are editorial, though. Even more, though, I am in favor from a readability / documentation standpoint. It is great not to be distracted by the selectors when discussing the rest of the model core and vice versa.

iherman commented 8 years ago

On 20 Nov 2015, at 07:23, Randall Leeds notifications@github.com wrote:

to items (1) and (2)

I am not as interested in a fragment syntax. I have often explained the purpose of the selectors as a mechanism to escape the need to define a fragment syntax, a projection to a single string, for arbitrarily complex selections.

Whether some subset of possible selectors should have a fragment syntax, for what media types, how that would come to pass, whether vendors would implement it.... all very difficult to say.

I agree. I would be o.k. separating this issue from the rest and look at it independently (and maybe outside this group) I agree that the first two steps are editorial, though. Even more, though, I am in favor from a readability / documentation standpoint. It is great not to be distracted by the selectors when discussing the rest of the model core and vice versa.

Actually, this is an aspect that I have not even thought about. And you are absolutely right from a pure readability point of view, too!

BigBlueHat commented 8 years ago

Agree with @tilgovi. :+1: on :one: and :two:, but not on :three: I believe for :three: we should move discussion into a wider, email-based "is this our job right now" discussion around Fragment Identifiers--as FindText could benefit from one...but that's also not necessary for it's value and would likely only trip it up.

I had earlier proposed extracting SpecificResource, but that was probably to large a piece to extract. Doing only selectors seems to give these bits the widest possible, while not introducing a new "wrapping" class for things that want to use them. So...again :+1: to :one: and :two:.

jjett commented 8 years ago

Agree with @tilgovi and @BigBlueHat . +1 on 1 and 2.

IIRC I mentioned that I have a use case for selectors outside of the annotation domain. Having finally crawled out from my school work rock, let me see if I can coherently articulate what that use case is.

The HathiTrust Digital Library is the public facing access point for the Google Digitization Efforts. It is a very large scale digital repository with over 14 million digitized books comprising several billions of pages. One of the ongoing questions is how to provide some form of access (or psuedo-access) for digital humanities researchers to the corpus for the purposes of computational analysis (for extracting features, modeling topics, etc.). Because 2/3rds of the corpus remains within the domain of copyright we've had to develop a specialized container called a workset[1]. The primary feature of the workset is that it provides a method for researchers to aggregate objects for analysis (see Figure 1). It also records a certain amount of metadata describing the aggregation as a whole (see Figure 2).

Figure 1: Basic HTRC Workset Model image

Figure 2: Full HTRC Workset Model image

The nature of the HTRC's current architecture limits what can be gathered into worksets to just a notional thing called a volume (which is itself an aggregation of some pages with metadata describing the aggregation). Our scholarly users want us to move beyond notional volumes and provide them with tools that let them aggregate finer grained objects of interest into their worksets. The want to gather together specific pages or features on pages rather than whole volumes so that the data preparation overhead can be reduced and previous feature extraction work can be fully leveraged.

Specific Resources and Selectors (and the other specifiers) provide a very, very good way of doing this exact thing. These things would provide us a method for selecting specific portions of page(s), e.g., a scholar wants to analyze the text of a collection of poems, each poem is named as a specific resource and the selectors provide the architecture with a relatively simple means of cherry picking just the poem's text off of the page and feeding it into the analysis algorithm.

As you can see, this is not an annotation use case. So again +1 to suggestions 1 and 2 but moreover -1 to any language that is going to peg specific resources and selectors/specifiers to something specific to annotations. This latter thing will put me in the awkward position of plagiarizing/reinventing specific resources and specifiers for the HTRC's workset context.

Regards,

Jacob

[1] Additional information on the workset data model can be found at: http://hdl.handle.net/2142/78149


Jacob Jett Research Assistant Center for Informatics Research in Science and Scholarship The Graduate School of Library and Information Science University of Illinois at Urbana-Champaign 501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA (217) 244-2164 jjett2@illinois.edu

azaroth42 commented 8 years ago

Regarding (1) and (2), I wouldn't mind moving all of section 4 to a different namespace, including SpecificResource, State, Selector, Style and associated properties. Moving just the Selector classes seems meaningless as:

But given the above, I'm overall :-1: to the change as proposed as there's no real benefit from adding another namespace, and a not trivial amount of work to have Selectors go through the process separately from the rest of the model. I would be -0 to extracting the entire section.

And I am strongly :-1: to (3). It's a lot of additional work, and I believe technically impossible given the current IETF standards around fragments. At the very least it would need the blessing of the TAG before starting work.

iherman commented 8 years ago

On 24 Nov 2015, at 01:00, Rob Sanderson notifications@github.com wrote:

Regarding (1) and (2), I wouldn't mind moving all of section 4 to a different namespace, including SpecificResource, State, Selector, Style and associated properties. Moving just the Selector classes seems meaningless as:

You still need to have a oa:SpecificResource to attach the xxx:Selector to. That is something that can be expressed in the RDF vocabulary, that I do not propose to change. In the JSON view of things like this (essentially trying to set the right domain and range) is something that is, I presume, a bit hand-waved anyway.

Having a separate document with State, Style, etc, beats the purpose. The goal is to make the Selector mechanism available to other, non-annotation structures; mixing them in makes the whole exercise pointless.

You still need to have something in the documentation pointing to the selector documentation, and vice versa for the SpecificResource. I do not see that as a major issue. But given the above, I'm overall to the change as proposed as there's no real benefit from adding another namespace, and a not trivial amount of work to have Selectors go through the process separately from the rest of the model. I would be -0 to extracting the entire section.

I agree this does not bring any benefit to strictly the annotation work. And we could argue that this is the WG's only purpose in life, therefore not do it. Doing it requires a little bit of altruism, so to say: recognize that these results of the Working Group in this area are, actually, so valuable that they could be used outside the realm of the annotation world, ie, the group would help in reuse. It is a little bit of extra work for a cosy feeling of making something useful to the community at large. We are spending quite a lot of time trying to re-use the vocabularies of other WG-s; why can't we offer something in return to the outside world? And I am strongly to (3). It's a lot of additional work, and I believe technically impossible given the current IETF standards around fragments. At the very least it would need the blessing of the TAG before starting work.

I can see the problems, as I said before. As a compromise, let us say that it is not on Rec track and if, by separating the Selector work, somebody can pick this up and run with it, we could do it elsewhere or, for a start, we could publish a note about it in the group. Depends how far we get with it, without compromising the group's timeline.

azaroth42 commented 8 years ago

Could you give an example of how just a Selector, with no relation to another resource that it selects some segment of, would be useful?

If Selectors should be split, then why keep State in the Annotation spec? Surely being able to describe the representation is a necessary first step to selecting a segment of that representation?

tilgovi commented 8 years ago

Can you explain your claim that it is "not trivial amount of work to have Selectors go through the process separately from the rest of the model", Rob?

Is there significant per-document overhead for publishing? Could that be mitigated through delegation or otherwise distributing the workload among more participants?


I agree that selectors aren't useful without a resource to select from, but the class of that resource is not necessarily our SpecificResource. For instance, it may be applicable to a domain where state is made superfluous by content-addressing.

tilgovi commented 8 years ago

I also think state could be a useful vocabulary by itself. An example usage scenario would be to record all the metadata about a particular request during end-to-end testing of a web service or to log a requests for usage or crash analytics.

I'm still totally unconvinced that style is a good idea at all, btw.

iherman commented 8 years ago

@azaroth42 :

Could you give an example of how just a Selector, with no relation to another resource that it selects some segment of, would be useful?

To come up with a very ad-hoc example in RDF, I could imagine a foaf-like information that looks something like:

<http://www.ivan-herman.net/foaf> a foaf:PersonalProfileDocument ;
    dc:creator "Ivan Herman" ;
    rdfs:seeAlso [
        ex:source <http://www.example.org/teamDescription>;
        ex:refines [
          a selector:FragmentSelector;
          rdf:value "xpointer(/body/section[2]/para[1])"
        ]
    ].

where the target of the seeAlso is a portion of the target document, and this vocabulary uses the (ad-hoc for now) source and refines terms. There is no reason to bring a SpecificResource into the picture, which is specific to Annotations; any RDF resource suffices.

Of course, the example above could be expressed by

<http://www.ivan-herman.net/foaf> a foaf:PersonalProfileDocument ;
    dc:creator "Ivan Herman" ;
    rdfs:seeAlso ex:source <http://www.example.org/teamDescription#xpointer(/doc/body/section[2]/para[1])>;

because it so happens that xpointer is a fragment id in XML land, but it is not, in fact, in HTML5 land, so it would not be correct (that simplification is why I want to have fragment identifiers, too, but we agreed about the complications).

Such situations where, essentially, the pure fragment identifier is not rich enough to express a selection is, believe, numerous, and there is no real solution out there to properly do that. More exactly, this group (and, to be fair, mainly the Annotation CG) have come up with a viable solution, and the community at large should profit from it.

Note that I do not say we should define how selectors should be used in general; that would be probably a step too far. This should be left to other vocabularies that decide to use selectors.

Editorially, this only means to separate (in the JSON version of the model) the current (and revised) section 4.2, with a forward reference (i.e., in the main document 'the value of the "selector" term is a Selector, see this and this document'). In the RDF document is simply means using a different namespace (and make clear in a note that the reason of having a selector is to use it in general). I must admit I do not see why this is considered to be so utterly complicated:-(

If Selectors should be split, then why keep State in the Annotation spec? Surely being able to describe the representation is a necessary first step to selecting a segment of that representation?

I do not have an opinion on that; but I am not sure why the state would be the necessary first step as you describe. I do not see this strong connection at all (as my example above shows).

iherman commented 8 years ago

On 25 Nov 2015, at 06:45, Randall Leeds notifications@github.com wrote:

Can you explain your claim that it is "not trivial amount of work to have Selectors go through the process separately from the rest of the model", Rob?

Is there significant per-document overhead for publishing? Could that be mitigated through delegation or otherwise distributing the workload among more participants?

The only extra step is that we have do define a separate short name and, for the FPWD, we have to get the formal approval from the domain lead (ie, Ralph). This isn't a big deal.

Editorially: there is a little bit of overhead in pushing the current 4.2 into a separate respec document, but that is minimal. The core text that the editors are working on right now remains, essentially, unchanged; even the examples may remain unchanged, by adding a note in the document that all examples are in the annotation domain, but there is no formal restriction to do so. Actually, your example could be reused (in a simplified form), too.

If needed, I can also take the load of pushing this through, once the current text (with all the extras around selectors that you guys are working on) are done.

I agree that selectors aren't useful without a resource to select from, but the class of that resource is not necessarily our SpecificResource. For instance, it may be applicable to a domain where state is made superfluous by content-addressing.

Agreed. But this is not something we have to specify.

gobengo commented 8 years ago

fragmention is a relevant previous effort at this (without as expressive of a selector vocab). /cc @kevinmarks

I am not as interested in a fragment syntax. I have often explained the purpose of the selectors as a mechanism to escape the need to define a fragment syntax, a projection to a single string, for arbitrarily complex selections.

For some reason that made this click for me. Using data uris as the fragment means any LD mimetype (e.g. turtle) could also be used.:

import json
import urllib

OA_CONTEXT = 'http://www.w3.org/ns/anno.jsonld'
JSON_LD_MIMETYPE = 'application/ld+json'
def url_of_quote(document_url, text):
    selector = { '@context': OA_CONTEXT,
                 '@type': 'TextQuoteSelector',
                 'exact': text }
    frag = 'data:{};{}'.format(JSON_LD_MIMETYPE,
                               json.dumps(selector))
    url = '{}#{}'.format(document_url, urllib.quote(frag))
    return url

url = url_of_quote("https://github.com/w3c/web-annotation/issues/110",
                   "I have often explained the purpose of the selectors as a "
                   "mechanism to escape the need to define a fragment syntax, "
                   "a projection to a single string, for arbitrarily complex "
                   "selections.")

print url

Produces: https://github.com/w3c/web-annotation/issues/110#data%3Aapplication/ld%2Bjson%3B%7B%22%40context%22%3A%20%22http%3A//www.w3.org/ns/anno.jsonld%22%2C%20%22exact%22%3A%20%22I%20have%20often%20explained%20the%20purpose%20of%20the%20selectors%20as%20a%20mechanism%20to%20escape%20the%20need%20to%20define%20a%20fragment%20syntax%2C%20a%20projection%20to%20a%20single%20string%2C%20for%20arbitrarily%20complex%20selections.%22%2C%20%22%40type%22%3A%20%22TextQuoteSelector%22%7D

Something that GitHub flavored MarkDown has no idea what to do with. But User-Agents or Web Annotation clients like Hypothesis could make use of.

azaroth42 commented 8 years ago

Regarding overhead, my opinion is that if they go through the process separately, we're making extra and unnecessary headaches. We would need to do the selectors document first, in order to refer to it from the model document. And then we'd need (I assume) a vocab document for the selectors, separate from the selector model document ... and for the annotation vocab document to include the selectors in the external section.

If we're going to split things out, then I think that the embedded content / textual body is an even better candidate. We know that there are multiple uses, because we have it, as does ActivityStreams, as did the Content in RDF not-quite-spec. No one else (that I know of) has approached the selector issue? And if they did, I don't see why they would be somehow afraid to use it if it was in an Annotation spec? The great thing about RDF is that you can pull appropriate classes and predicates from different vocabularies and use them as needed.

Regarding the SpecificResource, yes you successfully did not put type: SpecificResource into the RDF ... but the first blank node serves the same purpose, just using different non-standard vocabulary. To be clearer, my assertion is that you need some resource to play the role of the SpecificResource to make a Selector useful. I don't understand the reluctance to treat section 4 as a whole. State, Style, Scope and renderedVia are hardly annotation specific and would just be reinvented unnecessarily.

kevinmarks commented 8 years ago

I'm not sure why arbitrary complexity is seen as a goal in itself.

The point of fragmention is that it is a minimal extension of the fragment syntax to reference a multiword phrase. It solves a huge part of the 'referring to a subset of the page' problem.

My fragmention plugin doesn't highlight anything with bengo's url as the text is not in the document, but my fragmention to quote parser makes sense of it

BigBlueHat commented 8 years ago

If we're extracting anything, I'd prefer we extract all of Section 4: Specific Resource, discuss it's pieces for their broadest possible use (State, Style, etc), and move things back into the Annotation Model if they are deemed to be Annotation specific--which I'm pretty sure none of these are.

However, to Rob's point about...

The great thing about RDF is that you can pull appropriate classes and predicates from different vocabularies and use them as needed. ...we (as a WG) would need a clear reason why extracting it from Annotation and (our...presumably) publishing it as a separate document was more valuable and/or less confusing than RDF folks (etc) depending on the Annotation context and vocabulary to get access to SpecificResource, Selector, etc.

@iherman given that you're a key driver here, what do you see as the benefits of SpecificResource, Selector (and/or the other bits) being put into a separate document over folks just "consuming" the Annotation vocab into their APIs?

Other than "marketing" SpecificResource as a "new thing" with value all it's own (beyond annotation--which it certainly does have), I don't see much more that it gets us, but happy to be wrong. :smile_cat:

BigBlueHat commented 8 years ago

@gobengo @kevinmarks thanks for the input, but I think these comments trend off topic. Feel free to start separate threads or issues, however.

BigBlueHat commented 8 years ago

And @tilgovi, "Hail Hydra!" :wink: Per your "record all the metadata" use case.

iherman commented 8 years ago

@azaroth42:

Regarding overhead, my opinion is that if they go through the process separately, we're making extra and unnecessary headaches. We would need to do the selectors document first, in order to refer to it from the model document.

I do not think that is a problem. We are already restructuring documents, cutting one document into two. That is why I raise this now; doing it at this phase is no big deal. Doing it much later may generate headaches. And then we'd need (I assume) a vocab document for the selectors, separate from the selector model document ... and for the annotation vocab document to include the selectors in the external section.

If we're going to split things out, then I think that the embedded content / textual body is an even better candidate. We know that there are multiple uses, because we have it, as does ActivityStreams, as did the Content in RDF not-quite-spec. No one else (that I know of) has approached the selector issue? And if they did, I don't see why they would be somehow afraid to use it if it was in an Annotation spec? The great thing about RDF is that you can pull appropriate classes and predicates from different vocabularies and use them as needed.

From an RDF point of view: yes, you are right. RDF people can pull classes and predicates and they do it as their hearts' content. There may be a healthy reluctance, though, to do that if it is part of a larger and more complicated vocabulary, hence my proposal to put it into a separate namespace. Really, doing that on the RDF side is just syntactic sugar at this point.

But let me repeat what I said earlier: no, I do not think the vocabulary document has to be split. The only change I propose for that document is to introduce a separate namespace for the selector classes and the selector specific properties. That is it.

It is the JSON users that I am really concerned about. There is no tradition of partial reuse in that community. People "simply" wanting to use an alternative mechanism to fragment identifiers to describe a portion of a document will not reuse a portion of a major JSON vocabulary; they will reinvent the wheel. Unnecessarily.

Let us not forget that we are addressing two communities which are fairly distinct. I am not worried about the RDF one...

Regarding the SpecificResource, yes you successfully did not put type: SpecificResource into the RDF ... but the first blank node serves the same purpose, just using different non-standard vocabulary. To be clearer, my assertion is that you need some resource to play the role of the SpecificResource to make a Selector useful.

Yes, the first blank node would serve the same purpose. So what? Yes, of course you need some resource to "encapsulate" a Selector. Again, so what? I don't understand the reluctance to treat section 4 as a whole. State, Style, Scope and renderedVia are hardly annotation specific and would just be reinvented unnecessarily.

A SpecificResourse is just an RDF Resource. The only thing that it brings to the table is that it is the domain for a number of other properties that we define for annotations. Like roles (or whatever the name will be:-), or states; both are annotation specific things, just look at the various values we define for those properties. None of those properties/values have any sense in my foaf example and I would not reinvent them for my application because I do not need them. To take another example, I may use a similar construction for adding a complex metadata (say, an RDF version of ONIX metadata) to portion of a publication: the current annotation specific properties have no meaning in that context.

What is true is that, in an ideal world, the very concept of Specific Resource could be split, too. I could imagine having a separate class and have something like (just inventing the term on the fly):

<http://bla.bla.bla> a selector:SelectedResource ;
    selector:source <http://a.b.c/>,
    selector:select [ a  selector:TextPositionSelector;
        selector:start 412;
        selector:end 795
     ]
.

and then declare oa:SpecificResource to be a subclass of selector:SelectedResource as far as the RDF vocabulary goes. That would be even cleaner. The reason I did not propose that, originally, is to avoid any extra complication. However, it is perfectly doable to do that as well: it means a little bit more in the RDF document on the model, and hardly any difference in the JSON version, because it would be hidden in the @context.

But I am happy to make the minimal step only, and stick with what I originally proposed.

iherman commented 8 years ago

Ok, I see that I cannot really convince you guys, and I do not want to drag this on indefinitely. Let me propose a compromise solution. More exactly a compromise solution with sub sub-alternatives.

The main point is: we leave everything mostly as it is, and we also publish a Note on how these terms can be used outside of the Annotation domain. That note would, essentially, include (non-normatively, because it is a note) the definition of the Selector classes in such a way that the document would stand by itself for non-Annotation usage. (I am not sure whether that document would deal with RDF/Turtle, or only JSON-LD; probably the latter.)

To be a bit more precise, I see the following alternatives to realize that (beyond writing the note itself).

  1. Both the RDF and JSON documents stay as it is.
  2. The Selector class, as well as the relevant properties, go into a separate namespace. The RDF document has to change a little bit accordingly, the JSON document stays as it is (except for a tiny change in the @context file).
  3. Like alternative 2, but the note would also include the definition of SelectedResource class that is a superclass of SelectedResource, as well as the super properties for source and select. That means that note would have an RDF aspect defining those extra vocabulary items, but it is non-normative.
  4. The RDF document includes the SelectedResource class in the Selector namespace, plus the source and select attribute, and the SpecificResource becomes a subclass of SelectedResource. The RDF vocabulary changes a bit, but not significantly; the JSON document stays as it is (except for a tiny change in the @context file).

Obviously, alts. 3 and 4 are, semantically, identical, and implement the approach I outlined. Alternative 4 is clean; alternative 3 pushes the corpse entirely to the separate note. Both are doable.

Because it is a Note, we have the entire life span of the group to write and publish it. Obviously, putting my money (well, time, rather) where my mouth is, I would take on the responsibility for it.

azaroth42 commented 8 years ago

RDF people can pull classes and predicates and they do it as their hearts' content. It is the JSON users that I am really concerned about.

Given that, I don't see the advantage of creating a new namespace for just Selectors and their properties, as that community tends to eschew namespaces in general. If we go that route, then I still think that at least SpecificResource, State (plus subClasses), Selector (plus subClasses) and renderedVia would be the right granularity. I would leave out hasPurpose, hasScope and styleClass as Annotation specific. I would be :+1: to that, and :-1: to only Selector (as harmful to interop, as above). The richer namespace would be a useful addition to http://www.w3.org/2006/gen/ont

If we go the multiple namespace route, I think we would also want to do Content (which had its own NS to begin with, after all) and Collections (assuming we can't just use AS2). Both have more utility than selectors, especially selectors with no referent, and are actively needed in other communities beyond Annotations. It begins to look a bit like namespace soup, but it already does and JSON-LD hides it from those that would be offended.

The superclass idea does not work, as far as I understand the proposal, as a SpecificResource might /not/ be a segment at all. It could be the entire resource with only hasState to record where to find the archived copy of the content. Or the entire resource with a purpose (e.g. semantic tagging), a scope, a rendering agent, or a style. So I'm also :-1: to that, as it breaks the semantics as everything that is true of the superclass (e.g. that it's a segment) would not be true of the subclass. It could be a subclass of SpecificResource, but there would be no point to it as it would have no properties that weren't associated with its superclass.

A Note that explains how to align with existing approaches would be valuable (:+1:) and could be written to target JSON users who have at least some sympathy towards semantic interoperability rather than purely syntactic. This seems like something that could be done regardless of the namespace mechanics. Basically: Use this pattern, and put these entries into your context document. If you don't have one, just use this one we prepared specially for you.

iherman commented 8 years ago

RDF people can pull classes and predicates and they do it as their hearts' content. It is the JSON users that I am really concerned about.

Given that, I don't see the advantage of creating a new namespace for just Selectors and their properties, as that community tends to eschew namespaces in general.

You cut out part of my quote, which is:

RDF people can pull classes and predicates and they do it as their hearts' content. There may be a healthy reluctance, though, to do that if it is part of a larger and more complicated vocabulary, hence my proposal to put it into a separate namespace. Really, doing that on the RDF side is just syntactic sugar at this point.

On your other comments:

If we go that route, then I still think that at least SpecificResource, State (plus subClasses), Selector (plus subClasses) and renderedVia would be the right granularity. I would leave out hasPurpose, hasScope and styleClass as Annotation specific. I would be :+1: to that, and :-1: to only Selector (as harmful to interop, as above). The richer namespace would be a useful addition to http://www.w3.org/2006/gen/ont

You keep saying that, and I disagree... I did give you examples of pure Selector usage. I do not see why State and renderedVia would have an intricate relationship with Selector. To use my example, assigning an ONIX metadata to a specific portion of a document, if needed, has nothing to do with state or via.

To use your voting technique:-): :-1: on adding renderedVia and state to such a separate namespace. And :+1: to the Selector separation and I have no idea why you say that it is harmful for intro.

The superclass idea does not work, as far as I understand the proposal, as a SpecificResource might /not/ be a segment at all. It could be the entire resource with only hasState to record where to find the archived copy of the content. Or the entire resource with a purpose (e.g. semantic tagging), a scope, a rendering agent, or a style. So I'm also :-1: to that, as it breaks the semantics as everything that is true of the superclass (e.g. that it's a segment) would not be true of the subclass. It could be a subclass of SpecificResource, but there would be no point to it as it would have no properties that weren't associated with its superclass.

You either completely misunderstood what I said or (which I do not suppose) you seem to have some misunderstanding on the way the RDF semantics work. I suppose it is the former, because under the RDF semantics whatever you describe in this section is not a problem whatsoever. To make things clear to those among us who may not be familiar with all the details of RDF, here is a summary of what I proposed in this.

  1. We define a separate RDFS Class called (for now, selector:SelectorResource
  2. We move the properties source and selector properties into the selector: namespace; the refs:domain for these properties are defined to be selector:SelectorResource
  3. We move the current Selector class and corresponding properties into the the selector: namespace
  4. We define SpecificResource as a subclass of selector:SelectorResource

What #4 means is that if an instance x is of type SpecificResource then it is also of type selector:SelectorResource. But if the triple (y selector:source w) holds, we can deduce that y is of type selector:SelectorResource but not necessarily of type SpecificResource. That is all what these things mean. Put into human terms, applications may use selector:SelectorResource with, possibly, ignoring annotations, whereas annotations can use the same structures as before without any further problems.

This does not say anything about what properties are used for a x resource. It may or it may not use a selector, it may or may not use solely the hasState as you mention, or scope, or style, or whatever. Put it clearly: I simply do not see where this setup would break anything on the semantics of SpecificResource. Defining a domain for a property is just a license to deduce something, and is not defining obligatory properties. (RDF Classes are not the same as classes in a language like C++ or Python).

I maintain that having these in a separate namespace is enough, that it is an almost zero level editing on the current vocabulary, it does not break anything, but would be of a great value to the community.

We seem to fundamentally disagree on this, and that is fine. This happens. I think the decision should be done by the WG now.

A Note that explains how to align with existing approaches would be valuable (:+1:) and could be written to target JSON users who have at least some sympathy towards semantic interoperability rather than purely syntactic. This seems like something that could be done regardless of the namespace mechanics. Basically: Use this pattern, and put these entries into your context document. If you don't have one, just this one we prepared specially for you.

I am not sure what you mean by "some sympathy towards semantic interoperability rather than purely syntactic": I think that would be useful for any JSON user.

azaroth42 commented 8 years ago

Feedback from ProjectHydra Segment of a File folks: Would be useful to split out, but need to know how to use it.

davis-salisbury commented 8 years ago

Hi, this all came through at a bad time for me, so still trying to catch up, but I think Jacob provided quite a concrete case for a non-annotation specific usage, as did Ivan. In the publishing context there are numerous things that could benefit from selectors outside of annotations, and the "see also" reference is of course one of them. These are currently bogged down via links to XML IDs and other object-related ID schemes, but really the intent frequently is to be more general. A colleague of mine came up with this list below quite quickly:

Index terms (pointers to range of content in publication) See also references in indexes RDFa (or other triples) that provide provenance for components of a manuscript as it progresses through publication workflow Encyclopedia with cross references Quizzes with references to a range of content (potentially including visual or other pedagogical cues) Dictionary “see” references

So, I think there are plenty of reasons to do this, though I cannot speak fully to the modeling implications.

tilgovi commented 8 years ago

In the unfortunate scenario that we can't push everything through recommendation (pessimistic, I know, I'm sorry), isn't it also a possible benefit if we split out selectors that we might see that through even if we don't manage to get implementation and buy-in on all parts of annotation?

iherman commented 8 years ago

@tilgovi what I proposed, as a possible consensus, is:

  1. produce a Note (ie, no Recommendation) on the usage of selectors as an isolated entity, possibly also with a proposal for a fragment ID
  2. to make the previous approach a bit more tangible even for RDF users, put the selector terms into a different namespace (with a possible superclass structure, see my previous comment).

This means that we do not have to get buy-in from implementors right away officially, although implementations may decide to use this (at their own risk, in some sense). But it would mean a stake in the ground. Ie, this goes along the lines of what you propose I believe.

I have the impression at least that there is no strong push back on (1) above. @azaroth seems to be against (2).

azaroth42 commented 8 years ago

:+1: to an implementation Note, and propose that we do this during the CR phase.

iherman commented 8 years ago

@azaroth42 I am fine with the Note approach. But the question whether the selectors go into a separate namespace should still be defined before closing this issue. (I know you are against the separate namespace, but we need a WG agreement on this.)

tilgovi commented 8 years ago

Is there any problem is a namespace is used/augmented in multiple documents? If so, no need for a new namespace to get all the benefits of separately publishing them.

iherman commented 8 years ago

@tilgovi my issue has never been technical on this but more "social".

Per RDF spec, there is nothing that forbids the development of a vocabulary over several documents and communities, just as there nothing that forbids conceptually carving out part of a vocabulary for a different or restricted use. But whether these are socially good or acceptable is a totally different matter.

To take a different example (which we are not talking about here, just to illustrate the point): a vocabulary A may contain the predicate a:pred. Technically, it is possible in a vocabulary B to make a statement

a:pred rdfs:subPropertyOf b:mypred

but this is considered to be very bad practice, also referred to as "property hijacking". It is a social thing, though, not technical.

B.t.w., we are not talking about the first case in this issue (vocabulary over several document) but more the second (carving out a vocabulary)

azaroth42 commented 8 years ago

Proposal: postpone and work on a Note during CR. (To discuss 2016-02-19)

iherman commented 8 years ago

@azaroth42 : I am not sure which part of the discussion you want to postpone.

  1. Writing a separate Note: that, of course, can be done later and, actually, it is probably better to do it when the documents are indeed in CR, ie, technically stable.
  2. The issue of namespaces, ie, their separation, as well as the 'superclass' approach described in https://github.com/w3c/web-annotation/issues/110#issuecomment-163573384 is a change we have to do before CR (if we agree to do it); once in CR such a technical change is not possible any more.

At this point, the WG has to decide on, essentially, (2) above, I believe there is pretty much a consensus on (1).

Note that an alternative that came up in a separate discussion is that the note (1), more exactly its RDF companion, would define its own namespace and would use owl:samePropertyAs to bind terms to the annotation vocabulary. That would avoid touching the vocabulary document if you think it is too late to do that, but would have the same effect. Ain't nice in my view, though. And it also means that it would expose the SpecificResource class in general, which is closely bound to the annotation use case and is not really meaningful more generally... hence my preference for the separate superclass that I described in https://github.com/w3c/web-annotation/issues/110#issuecomment-160566248...

jjett commented 8 years ago

@iherman Personally I'd be fine with exposing the SpecificResource class in general. I thought the whole point of spinning selectors out is that there are plenty of non-annotation contexts that need them. IMO, the same is true of the SpecificResource class and pretty much everything that goes with it vis-a-vis specifiers. The workset/collection use case I illustrated back in November for this issue definitely benefits from reusing that entire section (Specifiers) of the annotation vocabulary in service of decidedly non-annotation use cases (selection and inclusion of finely grained collection members). So +1 for spinning it into its own namespace or at least not defining any of the specifier vocabulary in such a way as to preclude the use of classes and predicates outside of annotation contexts.

iherman commented 8 years ago

On 22 Feb 2016, at 14:49, Jacob notifications@github.com wrote:

@iherman https://github.com/iherman Personally I'd be fine with exposing the SpecificResource class in general. I thought the whole point of spinning selectors out is that there are plenty of non-annotation contexts that need them. IMO, the same is true of the SpecificResource class and pretty much everything that goes with it vis-a-vis specifiers. The workset/collection use case I illustrated back in November for this issue definitely benefits from reusing that entire section (Specifiers) of the annotation vocabulary in service of decidedly non-annotation use cases (selection and inclusion of finely grained collection members). So +1 for spinning it into its own namespace or at least not defining any of the specifier vocabulary in such a way as to preclude the use of classes and predicates outside of annotation contexts.

Let me a bit provocative;-) We are talking about pure RDF issues, and not about the user facing JSON serialization: what is really the usage of SpecificResource? It is simply a general RDF Resource (ie, the most general 'thing' in RDF land), and the only reason of its existence is that it is the domain for a number of RDF predicates. So the question is whether all those properties do make sense outside of the annotation world? I do not think so, like motivations.

Hence my proposal to define a superclass of SpecificResource which is used for selectors, and then SpecificResource is a 'specialization' of that general class for annotation purposes.

But I do not want to get into this discussion too far: if we decide to use SpecificResource outside of the annotation world for selectors: I will not formally object. Similarly, if there is a massive push back on spinning these out into a separate namespace: I will not lie down the road. The really important point is to make these usable outside the annotation world somehow, ie, document that properly. All this it is not, in my view, really clean modeling, but this will ever worry purists only.

jjett commented 8 years ago

@iherman I think your suggestion over complicates the situation. If we're limiting ourselves to pure RDF issues then the purpose of the Specific Resource is to act as a signpost indicating under which circumstances a group of assertions about some web resource is true. However since Specific Resource is a class it has no relationship to the annotation class per se (such relationships are the product of the hasBody and hasTarget predicates). As long as we don't define the class in such a way that it must be interpreted as something only ever within the range of the hasBody and hasTarget predicates then I don't see any reason not to expose it for use in contexts outside of annotations (in which it is only contingently involved in anyway).

Selectors have to do with Specific Resources and not with Annotations. We don't use Selectors (or any of the other Specifier vocabulary) when we don't have Specific Resources. From an RDF perspective these are a completely separate set of triples from the annotation and only relate to it through the rather general assertion of annotation -- hasBody/hasTarget -- specific resource. As noted we have use cases in hand for selecting things in contexts other than annotations. Specific Resource works very well to manage the transition from global assertion to assertion in context. So I don't see a reason not to spin out all of section 3. Not spinning it out is going to put me an the awkward position of reinventing it for collections and other bibliographic object contexts...

tilgovi commented 8 years ago

Here's a proposal:

SpecificResource should be renamed ResourceSelector. All it does is select a whole resource. As a selector, it can be further constrained with subSelector.

Of course, this means that you have to be able to attach style and state to a selector, but I've argued for that before anyway. I mean, why shouldn't we be able to say that we expect to style the quote, but not the paragraph that contains it? Right now, we're limited by saying that the whole target is to be styled regardless of how deeply we can resolve its selector chain.

As someone who's implemented clients, I would very much love to be able to say that the quote should be highlighted while the paragraph containing it should have a note icon next to it.

tilgovi commented 8 years ago

Style is just a selector that provides a hint about styling the resource. State is just a selector that constrains the representation of the resource to a particular time/content-type/language/etc.

I don't see why these are all not selectors, and therefore why the selection specification couldn't contain any subset of these and make sense, without referring to another type, the SpecificResource.

They are all selectors of differing semantics.

azaroth42 commented 8 years ago

Then there's no identity for the selected resource. You don't annotate a selector, you annotate a selection of a resource. :-1: from me, especially at this stage.

tilgovi commented 8 years ago

I'm sorry, Rob. I don't understand your comment. What do you mean there's no identity? The ResourceSelector provides an identity just as SpecificResource does.

tilgovi commented 8 years ago

Ahh, I see what you mean. Annotating the select-or rather than the select-ion. Does this matter or is this just a question of language? Could we not just call them Selection instead of Selector?

pciccarese commented 8 years ago

I tend to agree with Rob in terms of semantics.

azaroth42 commented 8 years ago

I think it matters, particularly when the resources are separately deferencable. If I can identify a region of an image, and it has a URI, when I dereference that URI I would expect to get an image, not a JSON-LD object. The format of the resource that is the target of the annotation should be image/jpeg, not application/ld+json. So to me at least there are two distinct resources that we need to capture ... the image segment with format of image/jpeg, and the selector with format of application/ld+json that describes how to extract the segment from the full image content.

tilgovi commented 8 years ago

The SpecificResource isn't really image/jpeg is it? It's likely, if we're talking about web annotation clients and servers, also application/ld+json.

I admit that, in terms of conceptual clarity and model simplicity, I'm currently really intrigued by this line of inquiry as well as the "inverse" selector pattern.

I see no reason for subSelector when source is a functional inverse that we already have.

I see no reason why hasState should be different from hasSelector, why selecting the representation at a point in time, or under a particular content type request, is different from selecting the representation at a region in space.

And hasSelector goes away entirely by inverting the chain and using source. Selections have sources and selector data.

Why the selector, itself, must be a resource of its own, rather than having its properties be part of a selection, is not clear to me. Arguably, maybe, for re-use. I'm not super compelled by that, though.

{
  "@type": "Annotation",
  "hasTarget": {
    "@type": "TextQuoteSelection",
    "exact": "illustrative",
    "source": {
      "@type": "CssSelection",
      "value": "div",
      "source": {
        "@type": "TimeSelection",
        "cached": "https://web.archive.org/web/20160223161828/http://www.example.com/",
        "sourceDate": "2016-02-23T13:30:00Z",
        "source": "http://example.com/"
      }
    }
  }
}
iherman commented 8 years ago

@tilgovi : I agree with @azaroth42 that changing the fundamentals of the model may be a bit too late at this point.

Beyond the timing issue, the approach in your example has the same problem for me as what I said in #93 (see https://github.com/w3c/web-annotation/issues/93#issuecomment-174875839): if we want to define a fragment identifier for selectors, this approach seems to make it more complicated. Even if I realize this is still an 'if', I do not think we should close the door on it at this point.

Bottom line: I would not touch the fundamental model at this point.

iherman commented 8 years ago

@jjett :

@iherman I think your suggestion over complicates the situation. If we're limiting ourselves to pure RDF issues then the purpose of the Specific Resource is to act as a signpost indicating under which circumstances a group of assertions about some web resource is true. However since Specific Resource is a class it has no relationship to the annotation class per se (such relationships are the product of the hasBody and hasTarget predicates). As long as we don't define the class in such a way that it must be interpreted as something only ever within the range of the hasBody and hasTarget predicates then I don't see any reason not to expose it for use in contexts outside of annotations (in which it is only contingently involved in anyway).

As I said, the current setup is not fundamentally wrong. What bothers me is that there are annotation specific properties that trigger a typing on Specific Resource. But it bothers me only a little, again as I said, I will not start a fight on this:-)

jjett commented 8 years ago

@iherman Is it possible for you to say which annotation properties trigger on the Specific Resource typing? I ask, because it seems to me that while selectors are a part of the annotation model vocabulary they are not annotation properties per se (this is because RDF triples are more independent of one another than the node types suggested in an xsd or similar schema document).

iherman commented 8 years ago

On 24 Feb 2016, at 14:59, Jacob notifications@github.com wrote:

@iherman https://github.com/iherman Is it possible for you could say which annotation properties trigger on the Specific Resource typing? I ask, because it seems to me that while selectors are a part of the annotation model vocabulary they are not annotation properties per se (this is because RDF triples are more independent of one another than the node types suggested in an xsd or similar schema document).

Per latest model document, SR is used for:

• Purpose
• Selector
• State
• Style
• Rendering
• Scope

meaning that, I presume, all predicates listed for those have their rdf domain set to SR (I am not sure about the status of the vocabulary document, I have not checked).

On that list, Selectors and probably State are of a general use, so those are fine; I guess having the "source" predicate bound to selectors is fine. The general usability of Style and Scope is on borderline. I think Rendering and Purpose are definitely closely related to annotation, particularly the latter.

tilgovi commented 8 years ago

We can define a vocabulary of selectors without any reference to a SpecificResource, no?

I may have lost the thread. Are we talking about SpecificResource because we want to put State into the same namespace? Or because we think SpecificResource needs to be in the selector namespace? Or because we're assuming that a selector namespace has the hasSelector relation?

If the latter, couldn't we just not? The namespace has all the selectors but it can be the annotation namespace that provides the hasSelector relation to these, and defines the SpecificResource class.

A selector namespace and base class needn't say anything about the domain of any relation, or the existence of a relation, that brings the selectors into meaningful use, annotation or otherwise.

We can simply present a vocabulary of selector descriptions. The annotation vocabulary brings those, by way of <SpecificResource hasSelector Selector>, into use as a means to specify a selection from a source resource.

azaroth42 commented 8 years ago

I would be okay to separate a more generic SpecificResource class from the Annotation specific functionality. I agree that Selector and State are generic, and the rest are Annotation specific.

I'm (still) not keen on a second namespace, as in the most common use (annotations), people will use the wrong one. Also, they would be potentially even correct to use the wrong one ... they're just using the CG versions of those predicates. As folks familiar with RDF are okay to pull out individual terms from ontologies, having them separate doesn't seem beneficial to me. If someone can outline the advantages of a separate namespace would be appreciated.

The core seems like:

And then the annotation specific part:

To me the Note is "If you want to describe regions of representations, then you need to have a Selector to describe the region, a ResourceSegment to identify it, and you might need a State to get the right representation from the Resource... here are those components." The URI of the RDF namespace is irrelevant.

tilgovi commented 8 years ago

The primary benefit I was anticipating was the ability to get consensus and publish a narrower scope of things.

Reflecting now, another benefit I see is that, while I've spent about four years now with pretty regularly thinking about client APIs I have yet to really come away with any clear vision of what an annotation API should look like. I find it hard to abstract the UX over the world of possible UI. However, identifying ranges via pluggable selectors is much more clear.