Closed iherman closed 6 years ago
In general, I think this is a good idea; having a mechanism in place that could map to a future RDF standard for representing text direction is forward looking.
Having keywords which are ignored in the RDF transformation is not unprecedented: @index
is also ignored.
There are also implications for term definitions and term selection when compacting:
@dir
might appear in a term definition, and be used to match values with the appropriate @dir
.@dir
could appear at the top level of a context, to set the default text direction.Basically, most everything that is done with @language
could be done with @dir
(dir maps?)
Just a quick bikeshedding note -- should we decide to support this feature, I think it would be better to use something longer that is more specific than @dir
, e.g. @direction
(better in that it is less ambiguous but still not great, IMO ... direction of what?).
@dlongley "direction" is the usual term in the i18n circles (for "base writing direction"); I think that, paired with @language
, it is fine.
@gkellogg I was wondering about @direction
maps, but I would have difficulties to come up with a use case for this. I would not want to add it to the spec just because we can do it...
I'm nervous of introducing something that doesn't round trip through RDF. Either RDF is our conceptual model, or it isn't. This would open the door for also adding @type
at the same time as @language
, despite that language tagged strings in RDF 1.1 have an explicit data type.
Not -1, but I think we need to consider the slipperiness of the slope we're starting to slide down.
Echoing @azaroth42 I'd prefer (if such things are even conceivable) that we iterate RDF to include text direction encoding.
Additionally, this doesn't address BiDi strings...nor can it (afaict): https://www.w3.org/International/wiki/Html-bidi-isolation https://www.w3.org/International/questions/qa-bidi-controls
@azaroth42, @BigBlueHat :
rdf:HTML
datatype may be the best solution but the comment in the string-meta document (but also this) is very relevant to the practical difficulties of this approach in practice.The reason why I raised this issue, in spite of sharing the same reservations that both of you have, is purely pragmatic: communities face this issue and we simply do not have a satisfactory solution. And, I presume, the "perfect is an enemy of the good" principle may apply...
Having had some discussion with some colleagues my attention was drawn on the approach taken by the Activity Stream Rec, which is, essentially, to represent a text with a base direction by injecting the Unicode BiDi control characters at the beginning of the string (\u200F and \u200E for RTL and LTR, respectively). The advantage is that this works with RDF without further ado.
I am not convinced this is really good for authoring a text. However, in JSON-LD 1.1 we have the freedom of using "our" syntax, i.e.,
"title": [ { "@value": "Moby Dick", "@language": "en" }, { "@value": "موبي ديك", "@language": "ar" "@direction": "rtl"} ]
But specifying that, when generating an RDF literal, the value of @direction
should be mapped on \u200F and \u200E.
I would still wait for the opinion of the I18N experts, but that would be an easy way of adding this missing feature to JSON-LD.
Cc: @azaroth42 @BigBlueHat
On 2018-02-08, at 16:10, Ivan Herman notifications@github.com wrote:
Having had some discussion with some colleagues my attention was drawn on the approach taken by the Activity Stream Rec, which is, essentially, to represent a text with a base direction by injecting the Unicode BiDi control characters at the beginning of the string (\u200F and \u200E for RTL and LTR, respectively). The advantage is that this works with RDF without further ado.
I am not convinced this is really good for authoring a text. However, in JSON-LD 1.1 we have the freedom of using "our" syntax, i.e.,
"title": [ { "@value": "Moby Dick", "@language": "en" }, { "@value": "موبي ديك", "@language": "ar" "@direction": "rtl"} ]
But specifying that, when generating an RDF literal, the value of @direction should be mapped on \u200F and \u200E.
in what way does adding this dimension to this encoding in order to represent a dimension in the respective value domain improving the capacity or facility of the encoding itself to represent linked data?
would one do this for iri values? for the sign of numeric values? for media types? for any other domain where the internal structure leads to significant differences in the control flow governing presentation and/or valid operations?
I would still wait for the opinion of the I18N experts, but that would be an easy way of adding this missing feature to JSON-LD.
Cc: @azaroth42 @BigBlueHat
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
This is used exclusively for text literals; it is in the same category as @language
. It has no bearing on URL-s or numeric values.
in what way does adding this dimension to this encoding in order to represent a dimension in the respective value domain improving the capacity or facility of the encoding itself to represent linked data?
I am not sure I understand the question... but I will try to see if I understand it right: the problem is not really related to the "linked" aspect of linked data, but to the literal values that are added to the Linked Data Cloud. A typical example would be the short description or the title of the book. The problem arises if the text includes a mixture of right-to-left and left-to-right text and we want to be sure that the consumer of the data (say, a program displaying the title and the description) does a proper job in terms of punctuation.
https://www.w3.org/International/articles/inline-bidi-markup/ describes some of the problem and the reason why it is necessary, in some cases, to have an explicit base direction (which is possible in HTML).
(As commented by @BigBlueHat, this does not solve all the issues, and we should not really try to do that, it would lead to reinventing the wheels of part of HTML.)
Cc @BigBlueHat @r12a
On 2018-02-08, at 17:13, Ivan Herman notifications@github.com wrote:
This is used exclusively for text literals; it is in the same category as @language. It has no bearing on URL-s or numeric values.
in what way does adding this dimension to this encoding in order to represent a dimension in the respective value domain improving the capacity or facility of the encoding itself to represent linked data?
I am not sure I understand the question…
yes, i see that, where you do not understand the symmetry between @language and numeric values. i, on the other hand, have never understood this note in the description of rdf semantics:
Language-tagged strings have the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is formally defined for this IRI because the definition of datatypes does not accommodate language tags in the lexical space. The value space associated with this datatype IRI is the set of all pairs of strings and language tags.
and have understood that anomaly to have been historically determined by the origins of rdf, to "describe resources”.
but I will try to see if I understand it right: the problem is not really related to the "linked" aspect of linked data, but to the literal values that are added to the Linked Data Cloud.
that is the issue to which i point.
A typical example would be the short description or the title of the book. The problem arises if the text includes a mixture of right-to-left and left-to-right text and we want to be sure that the consumer of the data (say, a program displaying the title and the description) does a proper job in terms of punctuation.
https://www.w3.org/International/articles/inline-bidi-markup/ describes some of the problem and the reason why it is necessary, in some cases, to have an explicit base direction (which is possible in HTML).
that is, whether json-ld is intended to serve a markup medium or as a means to encode linked data. yes i have read the suggestion, that rdf should be extended to support this aspect of markup, but i do not see why that would be a generally beneficial approach.
best regards, from berlin,
To echo what I believe James is saying, I think that we need to be careful not to conflate string rendering concerns (like direction) with the semantic concerns of string data.
So perhaps the essential detail of strings that is not currently captured is not text direction, per se, but an encoding of the script in which text text is written. Script is another characteristic of string data, somewhat orthogonal to language (as per https://www.w3.org/International/questions/qa-scripts#which, which points out that the language Azeri can be written in either Latin or Arabic script).
As far as I know, there's a one-to-one mapping between direction and script, though I'm often proven wrong by the complexities of human language.
I'm still opposed to encoding data into JSON-LD syntax that isn't natively supported in other RDF serializations, though.
p. (773) 547-2272 e. david.newbury@gmail.com
On Thu, Feb 8, 2018 at 8:35 AM, james anderson notifications@github.com wrote:
On 2018-02-08, at 17:13, Ivan Herman notifications@github.com wrote:
This is used exclusively for text literals; it is in the same category as @language. It has no bearing on URL-s or numeric values.
in what way does adding this dimension to this encoding in order to represent a dimension in the respective value domain improving the capacity or facility of the encoding itself to represent linked data?
I am not sure I understand the question…
yes, i see that, where you do not understand the symmetry between @language and numeric values. i, on the other hand, have never understood this note in the description of rdf semantics:
Language-tagged strings have the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is formally defined for this IRI because the definition of datatypes does not accommodate language tags in the lexical space. The value space associated with this datatype IRI is the set of all pairs of strings and language tags.
and have understood that anomaly to have been historically determined by the origins of rdf, to "describe resources”.
but I will try to see if I understand it right: the problem is not really related to the "linked" aspect of linked data, but to the literal values that are added to the Linked Data Cloud.
that is the issue to which i point.
A typical example would be the short description or the title of the book. The problem arises if the text includes a mixture of right-to-left and left-to-right text and we want to be sure that the consumer of the data (say, a program displaying the title and the description) does a proper job in terms of punctuation.
https://www.w3.org/International/articles/inline-bidi-markup/ describes some of the problem and the reason why it is necessary, in some cases, to have an explicit base direction (which is possible in HTML).
that is, whether json-ld is intended to serve a markup medium or as a means to encode linked data. yes i have read the suggestion, that rdf should be extended to support this aspect of markup, but i do not see why that would be a generally beneficial approach.
best regards, from berlin,
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/json-ld/json-ld.org/issues/583#issuecomment-364169636, or mute the thread https://github.com/notifications/unsubscribe-auth/AACG6EFRcptTd7r49k6tasJlPV9-YcTyks5tSyJWgaJpZM4R1cGM .
It may be that the best solution, for both JSON-LD and RDF, is to simply rely on rdf:HTML
to handle these use cases.
In any case, if it were to make it's way into the RDF Abstract Syntax, there would be a much wider debate from the RDF community, and past experience is that you can't really predict where these things will come out.
It may have more consequence for SPARQL, which is in more of a need of update than RDF, IMHO. Querying and emitting text direction would be non-trivial.
@gkellogg
It may be that the best solution, for both JSON-LD and RDF, is to simply rely on rdf:HTML to handle these use cases.
Yes, I did discuss that with users. However, the complexity involved for the consumer of the data in, essentially, use a HTML parser makes it fairly complex on that end, so there was a push-back. I agree that a complex situation with lots of mixing of different type of data may have to use that solution, having a simple solution for the the simple case is really necessary imho.
In any case, if it were to make it's way into the RDF Abstract Syntax, there would be a much wider debate from the RDF community, and past experience is that you can't really predict where these things will come out.
I would keep away from getting into the RDF Abstract syntax case. There may be a solution in defining a datatype for that purpose which would not change the abstract syntax, but even defining that would be beyond what this group would/should do.
However, the very pragmatic solution used by the Activity Stream people does not use any extra feature, as far as RDF goes: adding a few UTF-8 characters into a literal is perfectly within the framework of today's RDF. In fact, JSON-LD could fully ignore the whole issue and rely on users using this trick even for literals that are used within JSON-LD. My fear is that deployment of such solution would hit the obstacle of the difficulties editing such script; introducing @direction
could be seen as a syntactic help for the end users.
It may have more consequence for SPARQL, which is in more of a need of update than RDF, IMHO. Querying and emitting text direction would be non-trivial.
True. But this is not something that I see happening in the coming years...
@workergnome
So perhaps the essential detail of strings that is not currently captured is not text direction, per se, but an encoding of the script in which text text is written. Script is another characteristic of string data, somewhat orthogonal to language (as per https://www.w3.org/International/questions/qa-scripts#which, which points out that the language Azeri can be written in either Latin or Arabic script).
But this is completely covered by the language tag already, ie, this is not a problem. For a specific example, the tag zh-Hans
denotes a text in Chinese, using the simplified script (used in mainland China and in Singapore), whereas zh-Hant
is Chinese using traditional script as used in Taiwan or Hong Kong. (BCP 47 is surprisingly complex but powerful, see https://www.w3.org/International/articles/language-tags/).
As far as I know, there's a one-to-one mapping between direction and script, though I'm often proven wrong by the complexities of human language.
And most of the time it is indeed. But there are corner cases that require an additional information on the base direction that must be specified. This is the corner case that necessitates the rtl/ltr flags in HTML5. See https://www.w3.org/International/articles/inline-bidi-markup/
I'm still opposed to encoding data into JSON-LD syntax that isn't natively supported in other RDF serializations, though.
Yes. Ideally, even if the RDF Concept document is perfectly fine with the solution used by the Activity Stream standard, ideally a syntactic sugar would be good to have in Turtle, too. Pragmatically speaking, there are two serializations used for RDF these days on a large scale: JSON-LD and Turtle. JSON-LD has tendency (by design) to be used by RDF laypersons, or even people ignoring RDF, for whom an easy syntactic sugar would be welcome. (And I do not see any chance reopening the RDF WG at W3C as of now for this.)
(I could have added RDFa, but due to the specific environment of RDFa those users could more easily fall back on the more complete, but complex approach of using the rdf:HTML
datatype.)
On 2018-02-09, at 07:15, Ivan Herman notifications@github.com wrote: ...
Yes. Ideally, even if the RDF Concept document is perfectly fine with the solution used by the Activity Stream standard, ideally a syntactic sugar would be good to have in Turtle, too. Pragmatically speaking, there are two serializations used for RDF these days on a large scale: JSON-LD and Turtle. JSON-LD has tendency (by design) to be used by RDF laypersons, or even people ignoring RDF, for whom an easy syntactic sugar would be welcome. (And I do not see any chance reopening the RDF WG at W3C as of now for this.)
what is a graph store processor to do when presented with an rdf document encoded as json-ld which includes assertions as to text direction? what will be the intent, when the feature is to be extended to turtle? how is this encoding intended to round-trip through a sparql processor?
that is, if a sparql update request loads into a graph a document which is encoded as json-ld and a subsequent query produces a graph which includes terms which that load operation introduced into the store, how do those terms come to reflect any direction specifications present in the imported document in order that it can be reflected in the response? i would understand there to be two options. either one extends the dimensionality of the string term representation beyond the abstract rdf model to allow for direction or one extends the lexical form of string variations to include direction. as the direction concerns presentation rather than semantics, neither is appropriate. if it were to concern semantics, that would argue, that it should be reified through a predicate.
@lisp,
Thanks for the questions, it clarifies the intention. (This is of course all based on the supposition that we go along with the Activity Stream approach; the original proposal in the issue did not do that.)
what is a graph store processor to do when presented with an rdf document encoded as json-ld which includes assertions as to text direction?
The JSON-LD -> RDF processor is supposed to produce a text literal of the form "\\u200Fthe original text"@en
what will be the intent, when the feature is to be extended to turtle?
What the appropriate syntactic sugar would be in Turtle: I do not know. I have seen the proposal like "the original text"@en^ltr
.
how is this encoding intended to round-trip through a sparql processor? that is, if a sparql update request loads into a graph a document which is encoded as json-ld and a subsequent query produces a graph which includes terms which that load operation introduced into the store, how do those terms come to reflect any direction specifications present in the imported document in order that it can be reflected in the response?
You are right that it creates problems with SPARQL insofar as the SPARQL query syntax would also need to have something like that (probably following the Turtle syntax) and today it is not there. Ie, the only solution would be to operate with "\u200Fthe original text"@en` all along.
Note that, in JSON-LD 1.0, this is already a valid statement:
"title": {
"@value" : "\u200Fthe original text",
"@language" : "en"
}
A JSON-LD encoding of today's Activity Stream statement would have to this. Ie, to come back to your question, if the user uses SPARQL using this, it is fine. The usage of @direction
is purely a syntactic sugar.
Just to make it clear: I do not really like this solution. But we do have a real use case to be solved: publishers may want to use JSON-LD to encode metadata like title or authors, there is need to express titles in different languages and, possibly, directions. At the moment, this is perfectly fine JSON-LD:
"title": [
{"@language": "fr", "@value": "Vingt mille lieues sous les mers"},
{"@language": "en", "@value": "Twenty Thousand Leagues Under the Sea"},
{"@language": "ja", "@value": "海底二万里"}
]
but if one wants to add a base direction, the only choice as of now is to add a \u200F
or \u200E
manually into the text. By saying
"title": [
{"@language": "fr", "@value": "Vingt mille lieues sous les mers"},
{"@language": "en", "@value": "Twenty Thousand Leagues Under the Sea"},
{"@language": "ja", "@value": "海底二万里", @direction: "ltr"}
]
we simplify the authors' lives.
On 2018-02-09, at 11:29, Ivan Herman notifications@github.com wrote:
...
Note that, in JSON-LD 1.0, this is already a valid statement:
"title": { "@value" : "\u200Fthe original text", "@language" : "en" }
that is, the already defined encoding suffices, as is.
A JSON-LD encoding of today's Activity Stream statement would have to this. Ie, to come back to your question, if the user uses SPARQL using this, it is fine. The usage of @direction is purely a syntactic sugar.
Just to make it clear: I do not really like this solution.
which does little to strengthen an argument to accept it.
But we do have a real use case to be solved: … [whereby] if one wants to add a base direction, the only choice as of now is to add a \u200F or \u200E manually into the text.
to the extent that the application is compelled to entrain presentation information in the data, that seems an appropriate method and keeps the concern orthogonal from those of the myriad encoding forms. in particular, of json-ld, which (i had the belief) intends to encode relations among things rather than markup their presentation.
On 2018-02-09, at 11:29, Ivan Herman notifications@github.com wrote:
what will be the intent, when the feature is to be extended to turtle?
What the appropriate syntactic sugar would be in Turtle: I do not know. I have seen the proposal like "the original text"@en^ltr.
in case you do not appreciate why i am convinced that the apparent intent would be bad idea, please note these passages
@lisp,
it is my fault to have mixed up possibly three issues. Namely:
@direction
into JSON-LD 1.1, without any effect on the generated RDF (working for users who use JSON-LD only in a round-trip manner, but the results are not transferred into RDF)@direction
of (1) as a syntactic sugar for (2)I am not sure which of the three aspect you object to. Note that using (2) is perfectly valid today, ie, "\u200Fabcdefg" is not equal to "abcdefg", which perfectly in line with the definitions you refer to.
On 2018-02-09, at 13:03, Ivan Herman notifications@github.com wrote:
@lisp,
it is my fault to have mixed up possibly three issues. Namely:
• Introducing the @direction into JSON-LD 1.1, without any effect on the generated RDF (working for users who use JSON-LD only in a round-trip manner, but the results are not transferred into RDF) • A way of handling the issue in today's RDF by using the Unicode characters \u200F or \u200E • Combining the two above by considering the @direction of (1) as a syntactic sugar for (2) I am not sure which of the three aspect you object to.
all of them in one sense or another, but the second does permit a processor to treat the intended domain value as opaque.
Note that using (2) is perfectly valid today, ie, "\u200Fabcdefg" is not equal to "abcdefg", which perfectly in line with the definitions you refer to.
in a naive sense yes. in the sense which is implied by the “^ltr” proposal, as i understand the provisions which those passages from my earlier note describe for string comparison in the presence of language tags, no.
in the sense which is implied by the “^ltr” proposal, as i understand the provisions which those passages from my earlier note describe for string comparison in the presence of language tags, no.
Why?
On 2018-02-09, at 14:34, Ivan Herman notifications@github.com wrote:
Note that using (2) is perfectly valid today, ie, "\u200Fabcdefg" is not equal to "abcdefg", which perfectly in line with the definitions you refer to.
in the sense which is implied by the “^ltr” proposal, as i understand the provisions which those passages from my earlier note describe for string comparison in the presence of language tags, no.
Why?
because the language tag adds an additional dimension to processing control paths in order to accommodate the equivalence rules. to make matters worse, its definition is not even complete. either the domain should be expanded and include operations which close over the entire space, or, if that is not the intent, the relationship should be reified via a predicate. to apply sugar to the situation would be to compound a mistake. see: https://www.w3.org/TR/sparql11-query/#modOrderBy
Consider this a late-Friday afternoon idea for encouraging us to see things differently (for whatever it may teach us).
@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix oa: <http://www.w3.org/ns/oa> .
<#genesis>
a schema:Book ;
schema:name <data:text/plain,בראשית> .
<data:text/plain,בראשית> <oa:textDirection> <oa:rtlDirection> .
😸
I echo previous comments that not being RDF compatible is a no-go. My reason specifically being a very practical one, as somebody who does not even spend much time on the RDF end: it would break linked data signatures, or anything else using the RDF Dataset Normalization algorithm, no? I think a lot of us want to see more usage of json-ld that works in that direction. (Or at least it might leave out information so that two different sets of information could end up having the same normalized structure... bad especially for my hopes to move towards more and more content addressed storage for linked data...)
Anyway, that's all to say that I think one of two directions is best for now:
I think we could always revisit if/when RDF gets native direction
support.
Deferred to WG due to https://json-ld.org/minutes/2018-04-10/#resolution-3.
Closed in favor of https://github.com/w3c/json-ld-syntax/issues/11.
In some situations it is important/necessary to include the base direction of a text, alongside its language; see the “Requirements for Language and Direction Metadata in Data Formats” for further details. In practice, in a vanilla JSON, it would require something like:
(the example comes from that document).
At this moment, I believe the only way you can reasonably express that in JSON-LD is via cheating a bit:
and making sure that the
dir
term is not defined in the relevant@context
so that, when generating the RDF output, that term is simply ignored. But that also means that there is no round-tripping, that term will disappear after expansion.The difficulty lies in the RDF layer, in fact; RDF does not have any means (alas!) to express text direction. On the other hand, this missing feature is a general I18N problem whenever JSON-LD is used (there were issues when developing the Web Annotation Model, these issues are popping up in the Web Publication work, etc.).
Here is what I would propose as a non-complete solution
@dir
term, alongside@language
. This means this term can be used in place ofdir
above, ie, it is a bona-fide part of a string representation, and would therefore be kept in the compaction/expansion steps, can also be used for framing.@dir
is ignored when transforming into RDF. I.e., only the language tag would be used.[] ex:title "موبي ديك"^^rdf:internationalText(ar,rtl) ;
3.2. Go for a "generalized" RDF where strings can also appear as subjects (that has been a matter of dispute for a long time...). That would give the possibility to add such attribute to texts like directions 3.3. Some other mechanisms that I cannot think about@dir
value can be properly mapped onto an RDF representing the right choices (if such choices are worked out)Cc: @BigBlueHat @r12a