CSS font-matching algorithm may introduce fingerprinting issues

npdoty commented 4 years ago

Review of TTML2 2nd Edition noted many potential fingerprinting vectors: https://github.com/w3c/ttml2/issues/1189 (Whether those issues present a privacy risk depends on a clearer understanding of what information is revealed by content processors to whom.)

Addition of external font loading and the CSS font-matching algorithm could introduce those fingerprinting issues to IMSC 1.2.

Mitigations for fingerprinting in CSS are under discussion now in CSSWG and PING.

More info in email: https://lists.w3.org/Archives/Public/public-privacy/2020JanMar/0055.html

css-meeting-bot commented 4 years ago

The Timed Text Working Group just discussed CSS font-matching algorithm may introduce fingerprinting issues imsc#530, and agreed to the following:

SUMMARY: TTWG thanks @npdoty for raising this. In the context of continuing discussions and without understanding any specific improvements we can currently make, we will proceed with no changes for the time being.
SUMMARY: Discussion of additional questions raised in the linked email to continue offline.

The full IRC log of that discussion

<nigel> Topic: CSS font-matching algorithm may introduce fingerprinting issues imsc#530
<nigel> github: https://github.com/w3c/imsc/issues/530
<nigel> Nigel: Did we actually introduce CSS font matching algorithm?
<nigel> .. I see at https://w3c.github.io/imsc/imsc1/spec/ttml-ww-profiles.html#text-font-source
<nigel> .. that we introduced:
<nigel> .. "A Processor MAY use the [css-fonts-3] §5 font matching algorithm for associating a font with a run of text."
<nigel> .. My question is, if this is an option, not a requirement, why wouldn't the CSS handling
<nigel> .. of the privacy issue be implied by reference.
<nigel> Pierre: Just to point out that in §10.5 we mention the CSS font matching algorithm
<nigel> .. is also referenced via a defined term Font Matching Algorithm.
<nigel> .. Editorially we should improve that.
<nigel> Nigel: Right, and that's in the HRM section.
<nigel> .. The HRM considerations are in my view concerned with document validation, and there's
<nigel> .. no requirement for the presentation processor to follow any steps in the HRM to
<nigel> .. render content.
<nigel> .. I would not expect a user-oriented player to execute the steps of the HRM.
<nigel> Andreas: +1
<nigel> Nigel: And therefore there's no privacy issue associated with 10.5.
<nigel> .. That takes us back to 8.5.3.
<nigel> Pierre: To your earlier point Nigel, I don't see what action we can reasonably take.
<nigel> .. There are a lot of "mays" and "under discussion" and no proposed resolution.
<nigel> -> https://lists.w3.org/Archives/Public/public-tt/2020Mar/0013.html Email that prompted this issue
<nigel> Nigel: There are additional questions in the email that are not in the GitHub issue.
<nigel> Pierre: We have generic text in TTML2 about loading of resources, I believe.
<nigel> Glenn: There are some handwavy statements
<nigel> Pierre: About resource fetching?
<nigel> .. In the absence of specific concerns we can only offer generic guidance.
<nigel> Glenn: Exactly.
<nigel> .. I don't know what we can practically say.
<nigel> Pierre: We can ask about specific issues with the TTML2 text.
<nigel> Glenn: Ask for spec-ready text we can drop in.
<nigel> Pierre: Exactly, that's what we should do.
<nigel> .. We can't tell CSS and HTML how to do fingerprinting mitigation.
<nigel> SUMMARY: TTWG thanks @npdoty for raising this. In the context of continuing discussions and without understanding any specific improvements we can currently make, we will proceed with no changes for the time being.
<nigel> SUMMARY: Discussion of additional questions raised in the linked email to continue offline.

npdoty commented 4 years ago

We generally try to provide privacy and security guidance even for optional normative text that isn't required (MAY rather than MUST, for example). And we generally try to note privacy issues in all the places they appear, even if they might be mitigated or resolved in the future.

It might also be that the fingerprinting risk that does apply with CSS in the Web context doesn't apply with processors of TTML/IMSC, but I haven't been able to determine that as I'm less clear on how these processor implementations work in connection with the Web platform. I think that would be a useful discussion to have (via email or teleconference) and might help us provide better guidance on w3c/ttml2#1189 as well.

nigelmegitt commented 4 years ago

Answering some of the questions in your email @npdoty :

I'm less clear on whether the origin server can determine which font is selected by a content processor or what the rendered text looks like, which is the mechanism that creates the fingerprinting risk in the Web case. (Can the origin server obtain the rendered text in any way? Can it see the height or size of the region? Are there conditional requests based on which fonts are available or if a region is overflowed?) It would be useful for detailing the security and privacy properties whether origin servers had access to presented timed text beyond the loading of external resources. The risk wouldn't be greater for rendering IMSC profile documents more than TTML2 documents in general, but it would be new compared to IMSC 1.

Can the origin server obtain the rendered text in any way?

There's nothing in TTML or IMSC about this - it would be an implementation feature beyond anything in the specification.

Can it see the height or size of the region?

The origin server, in providing the subtitle document, is defining the size of the region, and the size of the text within it. This does not give complete information about the rendered result, because text layout engines vary on a pixel-by-pixel basis, and because the used fonts may differ. Furthermore, as part of the document processing context, the user may have had the option to specify some overrides to the document-specified formatting.

There is nothing in TTML or IMSC that defines any return path to the origin server for such overrides. Again, this would be an implementation-specific behaviour.

Are there conditional requests based on which fonts are available or if a region is overflowed?

No, there are no such conditional requests defined. TTML2 has a condition construct but the parameters provided as an input to that construct are not fully defined. Use of this construct is currently prohibited in IMSC.

Other fingerprinting opportunities

Going beyond your email, I've been wondering if there are other fingerprinting opportunities - please forgive me ignorance in this general area: I am very far from an expert in this privacy regime.

At a real pinch, might it be possible to construct a "pathological" case in which a set of URLs is provided for a font resource, using the available fallback behaviour, and IMSC documents are authored such that the way those fallback URLs are requested reveals some information? This is fairly far-fetched and not well thought through right now. It would be easier to work against specific fingerprinting concerns than generic ones.

In general any fingerprinting opportunity some malicious actor might be able to use would almost certainly be much easier to use through some other mechanism! For example if the document is requested as part of playback of video media in the context of a web page, there are probably many opportunities to fingerprint within that web page already. It is hard to see why anyone would try to use some feature of IMSC document playback in this context.

To make this point more concrete, consider a web-based video player: one could hook into some IMSC player feature to send reporting events back to an origin about the user's playback point, but there's no need to be so obtuse - there are plenty of opportunities in video player code to do this already regardless of the presence of subtitles or captions.

Likewise, any IMSC player that supports some kind of beyond-the-specification customisation user interface can send reporting data on the usage of that interface directly back to an origin if it has been implemented to do so, whether or not IMSC document playback is actually taking place.

swickr commented 4 years ago

For IMSC1.2 would it be sufficient to add an editorial note in 8.5.3 pointing to the (currently still open) https://github.com/w3c/csswg-drafts/issues/4055 in CSS?

palemieux commented 4 years ago

@swickr It is still not clear to me that the attack vector indicated at w3c/csswg-drafts#4055 is relevant to IMSC. Specifically, it looks like the attack vector requires a malevolent script accessing the user's font list. Is that correct? If so, IMSC does not specify any such scripting capability and/or any API that would allow the user's font list to be accessed.

@nigelmegitt at https://github.com/w3c/imsc/issues/530#issuecomment-601853057 suggests a different kind of attack where a malevolent site generates a large number of specially crafted IMSC document referencing font resources on the malevolent site, with the objective of determining the user's font by observing which font resource the TTML processor attempts to download from the malevolent site. Is that worth mentioning? If so, this attack could be mentioned in IMSC 1.2 temporarily, and ultimately moved to TTML 2 since it applies to any TTML 2 profile that supports downloadable fonts.

himorin commented 4 years ago

(In my understanding,,,) fingerprinting is a point(s) of difference in user environment which can categorize a specific execution environment into some groups, like which language (Accept-Language) is configured in an instance, and CSS font fingerprinting is to use which local font file is installed and available from web browsers etc. as this point, by configuring CSS (+JS if needed) to tell whether specific font is loadable from html content. In TTML (and IMSC), as shown in Figure 1 of TTML2 spec, our target environment is Presentation or Transformation processor to build distribution (or other) format before reaching to general public user environment, but as in output from 'Rendering Processor Q', there is a possibility to put some embers into post-processed data, I think. Of course, such environment could not be flexible as browsers, like such additional resource is not easily installable by end users for TV, although. On that point, I'd agree to comment of @swickr and 2nd point of @palemieux for having an editorial note to mention an issue (or even font-anti-fingerprinting note is better?) in spec(s) to warn implementers of TTML processor for recommended consideration.

palemieux commented 4 years ago

AFAIK we are not sure at this point what is the attack and how to mitigate it, so the best we can probably do today is add an editor's note merely pointing to this issue.

See proposed note at https://github.com/w3c/imsc/pull/532.

We can then get to the bottom of the issue in the coming weeks.

nigelmegitt commented 4 years ago

@nigelmegitt at w3c/imsc#530 (comment) suggests a different kind of attack where a malevolent site generates a large number of specially crafted IMSC document referencing font resources on the malevolent site, with the objective of determining the user's font by observing which font resource the TTML processor attempts to download from the malevolent site.

@palemieux that is not what I suggested: rather, I suggested that the users's location might identifiable through this highly circuitous route.

As I understand them, the semantics for downloading external font resources are completely independent of the installed fonts. In other words, if some text is styled with a font family that dereferences to an external font resource via a <font> element, then a presentation processor needs to obtain that resource (or re-use it from cache presumably) even if a similar-looking font is installed locally.

nigelmegitt commented 4 years ago

For IMSC1.2 would it be sufficient to add an editorial note in 8.5.3 pointing to the (currently still open) w3c/csswg-drafts#4055 in CSS?

@swickr please could you give us more information about how such a note might be helpful?

We generally try not to include speculative comments or references to not-concluded conversations in Recs if we can help it. In this case, the whole thread seems to refer to something that is a non-issue with IMSC 1.2 and TTML2, as far as I have been able to tell so far from the discussion.

I wonder if anyone is able to describe succinctly what fingerprinting vector is in fact exposed by IMSC 1.2's use of the TTML2 <font> element? (as opposed to hypothetical "I wonder if it would be possible but I don't understand the spec enough to be sure" concerns, which are a good starting place for discussion, but in my view not strong enough to warrant text in a Rec)

I think this is key because in general, specifying something in a subtitle/caption document does not in itself reveal anything; only the execution of implementations can reveal anything, and in this case I have not been able to locate any processor semantic defined by the specification that could or would reveal anything about installed fonts. I would be happy to have it shown to me though, if there is one!

npdoty commented 4 years ago

We typically try to note security and privacy issues even if those issues also apply to other likely features (like a Web page that uses CSS and has a risk of fingerprinting): it provides guidance to implementers so that they know the trade-offs when implementing and it provides a marker of the problem so that if it's resolved in another spec, the remaining threat or vulnerability is documented.

And I'm not sure about the distinction between the spec and the implementation. The privacy issues that we note in HTML or CSS or other Web specs only exist because they are implemented in particular software and the implemented software has typical (or optional or required) implementations that create privacy risks that we think are worth noting and mitigating. Definitions of markup languages can have relevant privacy considerations, even though they just define markup, based on how that markup will be consumed.

In the case of CSS font fingerprinting, that's typically not based on just a direct JavaScript call, but on having the browser render some text in a particular font with a particular fallback and then testing the size of the resulting element (that's why I was asking about rendered text, size and conditionality, because those are methods often used in browser fingerprinting). Whether external resources are loaded or not is also a way for a constructed document to send a signal to an external server about the configuration of the user's machine.

To the question from @palemieux and @nigelmegitt, I don't know whether specifying a font of a particular name and providing an external source for it would imply that it should be downloaded only if a font of that name is not present locally. It doesn't seem like that, but I'm not sure how to read it exactly.

(There could be related issues about caching of resources (determining whether the user has viewed this content before based on whether those external resources are fetched or not) that are relevant to any markup of external resources that are cached with HTTP, but those are typically less severe and I don't know that we have a corresponding issue for you to refer to.)

palemieux commented 4 years ago

Definitions of markup languages can have relevant privacy considerations, even though they just define markup, based on how that markup will be consumed.

My concern is that we end up documenting generic vulnerabilities in IMSC. Such vulnerabilities are best described in a generic document -- just as WCAG documents generic accessibility requirements.

and then testing the size of the resulting element

This is made possible by programmatic access to the DOM, completely independently of the characteristic of the source document, right? For example, it would apply to a text file or an image. In other words, the vulnerability is not created by source document, but by the platform that allows programmatic access to rendered content?

jfkthame commented 4 years ago

To the question from @palemieux and @nigelmegitt, I don't know whether specifying a font of a particular name and providing an external source for it would imply that it should be downloaded only if a font of that name is not present locally. It doesn't seem like that, but I'm not sure how to read it exactly.

My understanding is that specifying an external source for a particular font name would in effect "hide" any locally-installed font of the same name. (This is certainly the case for the analogous case in HTML/CSS of font families defined via the @font-face rule.) The name then refers only to the external source.

However, it's still possible to "fingerprint" the locally-installed fonts, by a slightly indirect method: the document can specify a font-family list with two names, the first of which is the font name it is interested in probing, and the second is linked to an external source.

So to detect whether, say, Zapfino is installed on the user's system, the document says something like tts:fontFamily="Zapfino,MyExternalResource", where MyExternalResource is defined via <font family="MyExternalResource"><source src="..."> to point back to a (non-cacheable) resource with a unique URL (e.g. with an appended fragment identifier used as a key) on the server. If that resource gets requested, then the server knows Zapfino was not installed.

By testing for the presence of a selection of font family names in this way, the server can potentially learn a lot about the user's installed font collection.

nigelmegitt commented 4 years ago

Thanks @jfkthame that really helps to explain the mechanism for fingerprinting. I'm not clear whether TTML2 and IMSC can suffer from that mechanism, but it certainly seems plausible if not likely.

jfkthame commented 4 years ago

(Just to be clear, that's not the only font-related fingerprinting mechanism; I believe the strategy of measuring the rendered size of a string of text in a particular font, and/or containing specific "interesting" Unicode characters, is currently the commonly-used method. But the approach outlined above is particularly interesting in that it does not depend on using APIs to measure or examine the rendered text, so it's immune to some suggested mitigations such as spoofing measurement results.)

skynavga commented 4 years ago

On Fri, Mar 27, 2020 at 4:52 AM jfkthame notifications@github.com wrote:

To the question from @palemieux https://github.com/palemieux and @nigelmegitt https://github.com/nigelmegitt, I don't know whether specifying a font of a particular name and providing an external source for it would imply that it should be downloaded only if a font of that name is not present locally. It doesn't seem like that, but I'm not sure how to read it exactly.

My understanding is that specifying an external source for a particular font name would in effect "hide" any locally-installed font of the same name. (This is certainly the case for the analogous case in HTML/CSS of font families defined via the @font-face rule.) The name then refers only to the external source.

However, it's still possible to "fingerprint" the locally-installed fonts, by a slightly indirect method: the document can specify a font-family list with two names, the first of which is the font name it is interested in probing, and the second is linked to an external source.

So to detect whether, say, Zapfino is installed on the user's system, the document says something like tts:fontFamily="Zapfino,MyExternalResource", where MyExternalResource is defined via <font family="MyExternalResource"> to point back to a (non-cacheable) resource with a unique URL (e.g. with an appended fragment identifier used as a key) on the server. If that resource gets requested, then the server knows Zapfino was not installed.

This is not a reliable test mechanism. Zapfino may be installed but not used for a variety of reasons and MyExternalResource subsequently referenced. For example, Zapfino may not have a glyph that corresponds to a character being rendered. Or the font selection strategy may require a contextual character sequence be mapped that is only available in the external resource but not Zapfino. Or the writing mode may be vertical mode, and only MyExternalResource supports vertical metrics. I could cite dozens of other reasons why Zapfino might be ruled out by a client but still loaded before moving on to the external resource.

By testing for the presence of a selection of font family names in this way, the server can potentially learn a lot about the user's installed font collection.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/imsc/issues/530#issuecomment-604936121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC4E36WXQJHQUSVXN6UAUTRJSAOZANCNFSM4LPK3DEQ .

jfkthame commented 4 years ago

This is not a reliable test mechanism. Zapfino may be installed but not used for a variety of reasons and MyExternalResource subsequently referenced. For example, Zapfino may not have a glyph that corresponds to a character being rendered. Or the font selection strategy may require a contextual character sequence be mapped that is only available in the external resource but not Zapfino. Or the writing mode may be vertical mode, and only MyExternalResource supports vertical metrics.

A site using such a mechanism to accomplish installed-font fingerprinting would presumably apply the "test" styling to specific simple content such as a single ASCII character in horizontal writing mode, so that such considerations aren't relevant.

In addition, the fact that a fingerprinting mechanism may not be 100% reliable is not sufficient to prevent malicious sites using it, or to protect users. It just needs to work fairly well much of the time in order to be a significant threat.

skynavga commented 4 years ago

This is not a reliable test mechanism. Zapfino may be installed but not used for a variety of reasons and MyExternalResource subsequently referenced. For example, Zapfino may not have a glyph that corresponds to a character being rendered. Or the font selection strategy may require a contextual character sequence be mapped that is only available in the external resource but not Zapfino. Or the writing mode may be vertical mode, and only MyExternalResource supports vertical metrics.

A site using such a mechanism to accomplish installed-font fingerprinting would presumably apply the "test" styling to specific simple content such as a single ASCII character in horizontal writing mode, so that such considerations aren't relevant.

In addition, the fact that a fingerprinting mechanism may not be 100% reliable is not sufficient to prevent malicious sites using it, or to protect users. It just needs to work fairly well much of the time in order to be a significant threat.

This still depends on reliance upon a heuristic that implementations choose to implement lazy fetch algorithms, which is entirely implementation dependent, i.e., outside the realm of the entire set of TTML and IMSC specifications.

palemieux commented 4 years ago

@npdoty See PR for your review.

nigelmegitt commented 4 years ago

I'm reopening this pending confirmation from @npdoty, @samuelweiler or PING more generally that the change made in #1203 and now merged addresses the issue, as requested at https://github.com/w3c/ttml2/pull/1203#issuecomment-638984684 on 4th June 2020.

css-meeting-bot commented 4 years ago

The Timed Text Working Group just discussed TTML2 Add consideration for font fingerprinting., and agreed to the following:

SUMMARY: Action for @nigelmegitt to go back to PING and explain the situation and request further collaboration

The full IRC log of that discussion

<nigel> Topic: TTML2 Add consideration for font fingerprinting.
<nigel> github: https://github.com/w3c/ttml2/issues/1202
<nigel> Nigel: The status is the PR was merged before a response from the PING folk who raised
<nigel> .. the issue, to my question asking for their comments on the TTWG's resolutions last week.
<nigel> .. It's also clear from @samweiler's comments that he would far prefer a normative statement.
<nigel> .. The impact of that would be that we would have to change the section the text is in
<nigel> .. to be normative, and that we should have some kind of test for it.
<nigel> .. That's my current reading.
<nigel> Pierre: I think we need to step back and meet with PING or really have a discussion about
<nigel> .. what the end objective is here.
<nigel> .. Is it to have a running list of potential privacy issues that get updated as new ones come
<nigel> .. up every new edition?
<nigel> .. Is it for a definitive list today?
<nigel> .. Is it to anticipate all potential mitigations?
<nigel> .. If we don't figure out the objective then we won't get to a conclusion.
<nigel> .. I sense that PING is trying to do something and I don't understand what that is.
<nigel> .. We need to step back. I think it is a bad idea to accept what they propose, but if we do,
<nigel> .. and then something else comes up, we're back to square 1.
<nigel> .. I think we, especially the Chairs and Editors, and I'm happy to help because of IMSC,
<nigel> .. need to clarify the objective with PING.
<nigel> Nigel: Enumerating our options:
<nigel> .. 1. Keep as is and when making the transition request to PR, note the lack of conclusion to this HR review, assuming it has not been resolved.
<nigel> .. 2. Change as per the request and deal with probably objections from within the TTWG.
<nigel> .. 3. Try to discuss more with PING and understand if there are other acceptable approaches from their perspective.
<nigel> .. Any others?
<nigel> Pierre: On the 2nd one, it's not only dealing with conflict within this WG. To me the biggest
<nigel> .. risk is what will happen next? We have to find a way to deal with those comments in the
<nigel> .. long run I think.
<nigel> .. In the case of accessibility, the situation is a lot clearer because the accessibility group
<nigel> .. has created a detailed document. We largely reference it and provide an interpretation
<nigel> .. of the requirements in that document within ours.
<nigel> .. That was extremely helpful when it came to the question of color contrast because
<nigel> .. we were able to go back to the APA document and argue about the requirements that
<nigel> .. were written. That really helped.
<nigel> .. Here we don't have that, we just have one comment on one vulnerability on one document.
<nigel> .. It is very hard to address those comments in isolation.
<nigel> Nigel: I note you're raising the stakes within W3C beyond TTWG there?
<nigel> Pierre: No, my concern with accepting their proposal verbatim, setting aside the impact
<nigel> .. on the process, which we could waive, and may result in an objection to override, which
<nigel> .. are already super annoying, but the 3rd part, accepting this one comment, does not
<nigel> .. provide a good template for future comments and how to work with the PING in the long run.
<nigel> .. For example we don't have clarity about whether they are individuals or the PING itself
<nigel> .. commenting.
<nigel> Nigel: Putting this another way entirely, we could say that the open-endedness of this is
<nigel> .. due in part to the lack of defined semantics for resource fetching in TTML2, and that
<nigel> .. we could tighten that up and clarify the extent of any vulnerabilities by specifying those
<nigel> .. resource fetching semantics.
<nigel> Pierre: I think that's what we're doing by deferring normative changes to a later edition.
<nigel> Nigel: We have another big challenge with specifying such fetch semantics is that the
<nigel> .. context of use of TTML and its resources is too broad. If external resources are provided
<nigel> .. as part of some sort of multiplexed stream of data, there may be no remote fetching
<nigel> .. at all, but we still would allow for referencing of resources external to the TTML document.
<nigel> .. So we can't straightforwardly solve this.
<nigel> Pierre: Yes, my biggest concern, is trying to solve these very complex problems at the
<nigel> .. last minute, normatively.
<nigel> .. I think if we say we will tackle them in the next edition, we will do it. We generally do,
<nigel> .. when we make a commitment like this.
<nigel> Nigel: It might be really hard, and take a long time.
<nigel> Pierre: It is completely independent in a sense. It is system dependent.
<nigel> Nigel: What to do?
<nigel> .. I think we should do nothing and wait. We don't have a transition request to PR imminent,
<nigel> .. because we have work to do on the IR.
<nigel> .. This gives a chance for PING to respond, and if they do not, then when we do get round
<nigel> .. to making the transition request, we can explain the situation and take silence as assent.
<nigel> Pierre: Does this block IMSC 1.2 because it references TTML2 2nd Ed?
<nigel> Nigel: Surprisingly, no, W3C accepts, rightly or wrongly, normative references to CRs
<nigel> .. these days.
<nigel> .. If we reverted the references to 1st Ed then we would not have addressed the PING and
<nigel> .. security comments against IMSC 1.2 which were delegated to TTML2 2nd Ed.
<nigel> .. I get the sense there's a bit of a house of cards here and it could get blocked.
<nigel> Pierre: I recommend that we pro-actively tell PING this is a complex issue that we don't
<nigel> .. think can be solved adequately at PR, and we intend to solve it with them in the next edition.
<nigel> Nigel: No arguments from me about trying to work more closely with them.
<nigel> SUMMARY: Action for @nigelmegitt to go back to PING and explain the situation and request further collaboration
<nigel> Pierre: I'm happy to help.

samuelweiler commented 4 years ago

I'm reopening this pending confirmation from @npdoty, @samuelweiler or PING more generally that the change made in #1203 and now merged addresses the issue, as requested at #1203 (comment) on 4th June 2020.

Thank you, Nigel. As in the PR, I am not satisfied. I think a normative mitigation is in order. Nick proposed a mere SHOULD, not a MUST - that gives implementers an out, if needed.

nigelmegitt commented 4 years ago

Understood @samuelweiler .

As per my action logged at https://github.com/w3c/ttml2/issues/1202#issuecomment-642737770 on 2020-06-11 I sent an email request to the Chairs of PING CC staff contacts of both groups, requesting a joint meeting between TTWG and PING to see if we could gain a better mutual understanding of each others' objectives, in the hope that we can reach to a resolution on this, given that there seems to be no proposal that is satisfactory to @samuelweiler and TTWG. At this time (2020-06-17Z0758) I have not yet had a response.

css-meeting-bot commented 4 years ago

The Timed Text Working Group just discussed CSS font-matching algorithm may introduce fingerprinting issues w3c/ttml2#1202 (PING review), and agreed to the following:

SUMMARY: Awaiting PING response to the Chair, other proposals for resolving this issue are welcome.

The full IRC log of that discussion

<nigel> Topic: CSS font-matching algorithm may introduce fingerprinting issues w3c/ttml2#1202 (PING review)
<nigel> github: https://github.com/w3c/ttml2/issues/1202
<nigel> Nigel: Just to note I did contact PING last week after the call and have not had a reply yet.
<nigel> .. Aside from that I'd note that we have no proposals on the table for resolving the impasse.
<nigel> .. The proposal from PING has objections from TTWG, and the proposal from TTWG that
<nigel> .. has consensus seems to have an objection from PING!
<nigel> .. I would really appreciate any other proposals if anyone can contribute them.
<nigel> SUMMARY: Awaiting PING response to the Chair, other proposals for resolving this issue are welcome.
<nigel> github-bot, end topic
<nigel> s/github-bot, end topic//

skynavga commented 4 years ago

@nigelmegitt re: https://github.com/w3c/ttml2/issues/1202#issuecomment-646097153, it is not clear to me that the comments of @samuelweiler represent a consensus PING position or represent his personal opinion; in any case, there are many precedents that permit us to decline to process his request for a normative change; we can simply resolve this by stating that the TTWG position is not to satisfy the requested change at this time; nothing in the process forces us to accept the change (in general or in the context of this specific CR);

nigelmegitt commented 4 years ago

@skynavga I agree, that is a possible course of action. As Chair, I am attempting to ensure that we have exhausted all routes to getting to a consensus view, and that includes @samuelweiler 's view regardless of whether it is a PING position or a personal one. If I am satisfied that we have exhausted all routes, then that only leaves the option that you describe.

andreastai commented 4 years ago

@npdoty, @samuelweiler As I understand the motivation of PING is to provide privacy and security guidance, in this case on strategies to avoid fingerprinting issues in the context of font downloading.

As I understand the discussion in #1203 one of the questions is, if guiding text is made normative.

@npdoty proposed the following:

A content processor SHOULD NOT dereference external font resources conditionally on the presence of user-installed fonts, where that dereferencing could reveal information about the user's system or fingerprint the user.

As the overall goal is to guide implementers in the right direction, could the following be an alternative (added as a Note):

It is strongly encouraged to NOT dereference external font resources conditionally on the presence of user-installed fonts, where that dereferencing could reveal information about the user's system or fingerprint the user unless there are valid reasons and the full implications are understood and the case was carefully weighed before implementing.

This has essentially the same meaning (it uses the definition of SHOULD NOT in https://tools.ietf.org/html/rfc2119). The only difference is that it does not use normative keywords. But the text may highlight the guiding aspect even better?

andreastai commented 4 years ago

@npdoty , @samuelweiler One additional option could be a more detailed guideline on how to avoid the fingerprinting on MDN (e.g. as a separate page in the IMSC chapter, https://developer.mozilla.org/en-US/docs/Related/IMSC). This would have the advantage that solutions can be updated more frequently, security, and TTML experts could work collaboratively on it and (at least in my opinion) the reach to implementers will be possibly better than in the specification itself.

skynavga commented 4 years ago

@tairt re: https://github.com/w3c/ttml2/issues/1202#issuecomment-648721821, I can accept your proposed language provided that: (1) change "It is strongly encouraged to NOT" to read "It is recommended that the the document processing context not", (2) change "the case was" to "the case is", and (3) appendix P remains non-normative. I should point out that we have precedent (in five notes) for the language "it is recommended" in other non-normative contexts in the specification text.

skynavga commented 4 years ago

Note that "document processing context" here should be linked to the terminology section, i.e.,

<loc href="#terms-document-processing-context">document processing context</loc>

css-meeting-bot commented 4 years ago

The Timed Text Working Group just discussed CSS font-matching algorithm may introduce fingerprinting issues w3c/ttml2#1202 (PING review), and agreed to the following:

SUMMARY: @nigelmegitt to respond to Sam regarding a joint meeting, to try to arrange it.

The full IRC log of that discussion

<nigel> Topic: CSS font-matching algorithm may introduce fingerprinting issues w3c/ttml2#1202 (PING review)
<nigel> github: https://github.com/w3c/ttml2/issues/1202
<nigel> Nigel: Some activity to report:
<nigel> .. 1. Sam got back to me earlier today or late yesterday proposing times for a joint meeting.
<nigel> .. 2. Andreas proposed an alternative, stronger-sounding wording, which Glenn thought
<nigel> .. could work modulo a couple of editorial tweaks.
<nigel> .. Sam proposed 1:45pm Eastern. That's a little late for me, he suggested the earliest
<nigel> .. possible time would be 1:30pm Eastern, but next week might work too.
<nigel> .. For a half hour call.
<nigel> .. I will respond to explore the options for a suitable time. Possibly it will be next week.
<nigel> .. I will propose a doodle, since several people may want to attend.
<nigel> .. Hopefully this will allow us to understand each others' objectives and constraints and
<nigel> .. work towards a consensus solution.
<nigel> .. Thank you Andreas for your proposals too. They look good to me also.
<nigel> Andreas: No response to my comments, other than from Glenn.
<nigel> Nigel: Good, let's hope that we have a path out of this.
<nigel> SUMMARY: @nigelmegitt to respond to Sam regarding a joint meeting, to try to arrange it.

css-meeting-bot commented 4 years ago

The Timed Text Working Group just discussed CSS font-matching algorithm may introduce fingerprinting issues w3c/ttml2#1202, and agreed to the following:

SUMMARY: @nigelmegitt to ask @samuelweiler for additional proposed slots.

The full IRC log of that discussion

<nigel> Topic: CSS font-matching algorithm may introduce fingerprinting issues w3c/ttml2#1202
<nigel> github: https://github.com/w3c/ttml2/issues/1202
<nigel> Nigel: I finally got round to setting up a doodle for this, not everyone has been able to
<nigel> .. respond yet.
<nigel> Pierre: Unfortunately I cannot make the two current most likely dates. It looks like Sam has the most restricted availability.
<nigel> Andreas: I agree with Pierre, Sam's availability is most restricted, so maybe we should ask
<nigel> .. him for some proposed slots in the next two weeks?
<nigel> Nigel: Good idea, I will.
<nigel> SUMMARY: @nigelmegitt to ask @samuelweiler for additional proposed slots.
<nigel> Andreas: I wonder if our meeting would be an option too?
<nigel> Pierre: Regrets from me for Thursday 23rd July, most likely. I'd be available following the meeting.
<nigel> Nigel: That's an option I could add.

nigelmegitt commented 4 years ago

Discussed on a call on 2020-07-27, minutes at https://www.w3.org/2020/07/27-tt-minutes.html

Chair's summary, based also on @plehegar 's statements at the end:

TTWG to go ahead with the closest language that they can agree to the PING review request
PING to consider their options given the changes
TTWG and PING to continue to work together in the future to try to improve the mitigation to the issue raised

skynavga commented 4 years ago

I've drafted a new PR (#1210) that attempts to address the comments from PING, but without going as far as making the language normative. Nonetheless, I have included language "should consider not", which, in the present context (Appendix P), has a non-normative status. I would be willing to go as far as changing this to "should not" if folks prefer that. N.B. As I mentioned on today's call, we have precedent for using the language "should not" in non-normative text, so doing so would not introduce new precedent.

samuelweiler commented 4 years ago

would not introduce new precedent.

Why the worry about precedents?

skynavga commented 4 years ago

@samuelweiler because we are a WG with 17 years of history which includes a history of established consensus about how to write specifications, what should and should not go into specifications, how testing is approached and a myriad of other details the sum of which form the basis for what traditional standards development organizations, such as ITU, ISO, ANSI, and others consider fair and best practice; in other words, it's our body of convention; the PING, the IETF, other SDOs, as well as individual editors, have their own conventions... you will find many distinct conventions within the W3C; for example, the HTML WG was comfortable publishing a spec (HTML5) that was largely untested and perhaps untestable in a significant way; however, the TTWG has not been comfortable in doing so, as was mentioned by @nigelmegitt in our recent call: that represents a difference of convention, or, a difference in the role of precedent as it were

nigelmegitt commented 3 years ago

@npdoty #1210 was merged and published in TTML2 2nd Ed CR2 if you would like to confirm that the change represents at least an improvement.

w3c / ttml2