w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs
https://w3c.github.io/web-annotation/
Other
142 stars 30 forks source link

Selector Fragment URI Usage in HTML Serialization Note #400

Closed BigBlueHat closed 7 years ago

BigBlueHat commented 7 years ago

New section presenting Selector Fragment Identifiers in use in anchor, blockquote, and q tags.

Legible version

/cc @tcole3 @csarven @iherman

BigBlueHat commented 7 years ago

@iherman my respec references likely need help... πŸ˜•

tcole3 commented 7 years ago

I didn't see the ipr check issue initially, so I guess merging should wait for Ivan since I don't know what he does to fix that?

As for the substance of the examples, interesting but now that I look more closely, are we stretching the scope of the Note a little too far? As it stands are these annotations or just uses of selectors to identify the sources of quotes - i.e., just instances of the ResourceSelection class as described in the Selector Note? If so, and given that the other examples are annotations, probably should be more explicit about this distinction in introducing these examples. But if others are also, I'm okay if we want to include examples of ResourceSelection class.

It took me a bit to imagine what the json-ld for these examples would look like. Could we provide json-ld equivalents? Ivan's converter tool does a nice job with these, I think.

And lastly, these are using Text Position Selector without State (which is recommended). Okay, but perhaps we should mention somewhere that a refinedBy could be added to align with the recommendation in the data model - albeit making everything more complicated (so let's not show). Presumably the document targeted is relatively static with no real dynamic content. Might need to say this too.

iherman commented 7 years ago

Marked as non-substantive for IPR from ash-nazg.

iherman commented 7 years ago

(The Rawgit version is here to enjoy...)

iherman commented 7 years ago

I didn't see the ipr check issue initially, so I guess merging should wait for Ivan since I don't know what he does to fix that?

(It is a tool that checks the potential issues on IPR with external committers. I have no idea why it complaints about Benjamin, but, at this point, I am lazy to chase down this with the system team. I always mark the changes non-substantial, which essentially means there are no IPR issues, which is true for the note anyway.)

As for the substance of the examples, interesting but now that I look more closely, are we stretching the scope of the Note a little too far? As it stands are these annotations or just uses of selectors to identify the sources of quotes - i.e., just instances of the ResourceSelection class as described in the Selector Note? If so, and given that the other examples are annotations, probably should be more explicit about this distinction in introducing these examples. But if others are also, I'm okay if we want to include examples of ResourceSelection class.

I think adding this section is a good idea. One of the issue, that we actually ran into yesterday without saying it, is that the identification of an element within an HTML source (using, e.g., the @id attribute) is not the same as referring to the content of that element. (RDF affectionados are sensitive to these details.) Using the selectors is a clean way of doing that. Maybe a half-sentence emphasizing this issue would help.

It took me a bit to imagine what the json-ld for these examples would look like. Could we provide json-ld equivalents? Ivan's converter tool does a nice job with these, I think.

If we were to use JSON-LD, then making use of the fragment ID-s may become unnecessary. After all, if one uses JSON-LD, then one can simply use the selectors as defined originally, without using the fragment ID! It is when using RDFa that it becomes a real advantage: instead of encoding the whole selector into HTML+RDFa (which is tedious), one could simply use a simple IRI with the fragment. That is a real win in that world...

And lastly, these are using Text Position Selector without State (which is recommended). Okay, but perhaps we should mention somewhere that a refinedBy could be added to align with the recommendation in the data model - albeit making everything more complicated (so let's not show). Presumably the document targeted is relatively static with no real dynamic content. Might need to say this too.

I think this may go a little bit too far into the details. Just referring to the selector note saying that much more can be achieved by using all the possibilities (and referring to the note) might be enough.

BigBlueHat commented 7 years ago

Using the selectors is a clean way of doing that [identifying content vs. the node]. Maybe a half-sentence emphasizing this issue would help.

@iherman @tcole3 need me to write that half-sentence? πŸ˜„

I think this may go a little bit too far into the details. Just referring to the selector note saying that much more can be achieved by using all the possibilities (and referring to the note) might be enough.

πŸ‘

iherman commented 7 years ago

@iherman @tcole3 need me to write that half-sentence? πŸ˜„

yep... :-)

tkanai commented 7 years ago

According to HTML5 spec, the cite attribute must be a valid URL, then, the IRI based fragment selector (ID) must be converted into URL (Location), strictly speaking. Since "%" in IRI is not a part of reserved character in URL, the letters, such as %20, %3D, %2C, %23, which are percent encoded in the selector mapping process must be re-encoded and will become %2520, %253D, %252C, %2523, respectively. I don't think it is necessary to modify the examples, they are valid URLs fortunately, but I think the process of IRI to URL conversion should be mentioned in the section.

By the way, are there any strong reasons to express the fragment selector as ID? The selector itself is well structured and the selected text, or something, could be identical with another representation of fragment selector (ID). Do we need to tell them apart?

BigBlueHat commented 7 years ago

@tkanai good point on the IRI thing...not sure how best to mention that though without opening pandora's box. πŸ˜• Thoughts?

By the way, are there any strong reasons to express the fragment selector as ID? The selector itself is well structured and the selected text, or something, could be identical with another representation of fragment selector (ID). Do we need to tell them apart?

Do you mean that you could reference a paragraph by both it's author-given fragment ID (ex: #para1) and via a Selector Fragment (ex: #selector(type....))? The biggest difference being that one doesn't need the author of the referenced text to name all the things--which I know you know. πŸ˜„ Should we mention that?

tkanai commented 7 years ago

@BigBlueHat No, I'm talking about Selector Fragment Identifier only. We can "identify" a text on a Web page with both Selector Fragment Identifier with Text Quote Selector (IRI-Alice) and Selector Fragment Identifier with Text Position Selector (IRI-Bob), for example. Then the question was do we need to call the text with its name (Alice, Bob)? If Selector Fragment Identifier is to be used to get to the text, I think it should be called with its location or address which tells us where the text is, rather than its name. It means both Alice's address and Bob's address should be used for that purpose.

This is a valid Selector Fragment Identifier. http://jp.example.org/page1#selector(type=TextQuoteSelector,exact=γƒšγƒ³γ‚’,prefix=私は、,suffix=ζŒγ£γ¦γ„γΎγ™) Unfortunately, this is invalid as URL. To use this string as URL, we have to encode the Japanese text. Then, the string of IRI and the string of URL are not identical. If these two were identical, I had no concerns. This is why I would like to clarify whether the given sting is "name" (Identifier, IRI) or "address" (Location, URL).

I could say the IRI based spec, Selector Fragment Identifier describes how to name fragments on Web. Thanks to the IRI to URL conversion method, we can get to the fragments. It is very capable and I like it. On the other hand, I'm afraid that the original intention of Selector Fragment Identifier might be to provide means to describe location of fragments. If so, I think Selector Fragment Identifier should reform its basement from IRI to URL. Then this note will not need to open pandora's box. What do you think?