w3c / wpub-ann

Web Annotation Extensions for Web Publications
https://w3c.github.io/wpub-ann/
Other
6 stars 10 forks source link

The use case of WP requires a small extension of the WA model: "source" for Selectors. #4

Closed iherman closed 6 years ago

iherman commented 6 years ago

There seems to be a need to extend (albeit slightly) the WA model insofar as the "source" relationship could also be specified for a Selector. This is made important due to the duality of an address of a WP vs. and address of a constituent resource.

iherman commented 6 years ago

Another possibility would be to make a Selector a subclass of Specific Resource as an extension of the WA Model. I believe specifically adding source to the Selector is more harmless...

azaroth42 commented 6 years ago

I'm a strong :-1: on this, I'm afraid. The SpecificResource is the Selection, the Selector is the description of how to create the Selection from the source. If you also needed to have a State or a Stylesheet to further clarify the selection, then you'd need to add them on to ... the selector? And then everything would be a complete mess.

Just use the Specific Resource with source and a Selector.

iherman commented 6 years ago

So... @azaroth42, how would you express:

{
    "source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
    "selector": {
        "type": "CssSelector",
        "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
        "value": "#elemid > .elemclass + p"
    }
}

i.e., that the selector is on a resource within a wpub?

Thinking it further, maybe here is the case where we should use scope. So what about

{
    "scope": "https://dauwhe.github.io/html-first/MobyDick.wpub",
    "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
    "selector": {
        "type": "CssSelector",
        "value": "#elemid > .elemclass + p"
    }
}

which actually looks good. However... we need the notion of a selection spanning over, say, two resources. The current example is:

{
    "source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
    "selector": {
        "type": "RangeSelector",
        "startSelector": {
            "type": "TextQuoteSelector",
            "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
            "exact": "Call me Ishmael.",
            "suffix": "Some years ago"
        },
        "endSelector": {
            "type": "TextQuoteSelector",
            "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html",
            "exact": "A hundred black faces turned round",
            "suffix": " in their rows",
            "prefix": "sitting in Tophet. "
        }
    }
}

I am not sure how you would put those two together even with scope...

iherman commented 6 years ago

@azaroth42 answering to my own question... Here is what I came up with.

  1. Remove the source for Selectors (ie, come back to the WA model)
  2. Use the scope as part of this note, too, like in my previous comment. The text in the spec would probably say something like the 'scoped' URL is identical to the value of source if the relation is not used.
  3. As a consequence, the RangeSelector should also go back to the WA version (also in the example), and an extra note should be added (as a clarification) that a RangeSelector always refers to two selectors in the same resource
  4. Define a new selector which, for now, I called "MultiRangeSelector", but we may simply call it a "WPUBSelector", which lists an array of Specific Resources as some sorts of "keys" to describe an interval of resources. This is how the example would look like:
{
    "source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
    "selector": {
        "type": "WPUBSelector",
        "selectors": [
            {
                "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html",
                "selector": {
                    "type": "TextQuoteSelector",
                    "exact": "Call me Ishmael.",
                    "suffix": "Some years ago"                           
                }
            },
            {
                "source": "https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html",
                "selector": {
                    "type": "TextQuoteSelector",
                    "exact": "A hundred black faces turned round",
                    "suffix": " in their rows",
                    "prefix": "sitting in Tophet. "                           
                }
            }
        ]
    }    
}

WDYT?

The only caveat is that the fragment ID encoding of this guy will look particularly ugly. I think we are pushing the fragment ID syntax to its limits here... But that is a separate issue (#6) after all...

Cc @tcole3 @BigBlueHat

azaroth42 commented 6 years ago

IMO this is the use case for refinedBy [1] to first select the HTML in the WPub and then sub-select the text.

For ease, I invent a new EmbeddedResourceSelector but this might be achievable with CFI as a FragmentSelector ... which I leave as a homework exercise for any CFI experts :) For the more complicated case of the selection spanning the content of two HTML files within the same publication, using the new selector ...

{
  "type": "SpecificResource",
  "source": "https://dauwhe.github.io/html-first/MobyDick.wpub",
  "selector": {
    "type": "RangeSelector",
    "startSelector": {
      "type": "EmbeddedResourceSelector",
      "value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c001.html"
      "refinedBy": {
        "type": "TextQuoteSelector",
        "exact": "Call me Ishmael"
      }
    },
    "endSelector": {
      "type": EmbeddedResourceSelector",
      "value": "https://dauwhe.github.io/html-first/MobyDickNav/html/c002.html"
      "refinedBy": {
        "type": "TextQuoteSelector",
        "exact": "A hundred black faces turned around"
      }
    }
  }
}

Which would also work for any "container" format where the items in the container have some identity to put into value.

[1] https://www.w3.org/TR/annotation-model/#refinement-of-selection

iherman commented 6 years ago

I would keep away from CFI, actually. It is bound to a very particular OCF format in EPUB, which we will not have.

Your EmbeddedResourceSelector is interesting. So by default, something like

{
    "source": "http://....wpub",
   "selector" : {
       "type" : "EmbeddedResourceSelector",
       "value": "URL of a resource"
   }
}

Would 'select' a complete resource. It may make a lot of sense to have it for formats like WPUB.

It does not necessarily invalidate the approach I had in mind with 'WPUBSelector' (the name is probably wrong) which can pick up a bunch of the resources. I am not sure yet whether both are necessary, of whether they can be combined.

azaroth42 commented 6 years ago

If not CFI (or other FragmentSelector syntax) then yes, I would propose a simple selector like the above. Note in the docs for refinedBy we even mention explicitly the use case of selecting a document from a packaging format ... we just didn't define a selector that can do that.

iherman commented 6 years ago

But it does not really cover the case when there are many resources not only a simple start and end. That being said, I am not sure that is really necessary...

I propose the following: I would add this EmbeddedResource stuff to the document (sometimes during the weekend or Monday at the latest) merge the whole thing so that we have only one document we are dealing with. We can discuss it later to see if we need, beyond that, the WPUBSelector as well.

iherman commented 6 years ago

Closing this issue, in favour of PR #8.